diff and patch2024-10-23
cat test.txt will print the content of test.txt to the screen.
However if you do cat by itself, you seem to be “stuck”.
Things to try: CTRL-C (break), CTRL- (really break), CTRL-D (exit).
Sometimes CTRL-D on newline.
However, don’t press it twice, or you’ll exit your shell ;)
Note
Mac users: This is the one time when CTRL-C is also CTRL-C.
Warning
Windows users: don’t use CTRL-C for copying in gitbash.
If you thought that at least numbers were something you could count on being simple….
You’re wrong!!
Written as 0b.....
A single 1 or 0 is called a bit. 8 bits is a byte:
There are 10 types of people in the world: those who understand binary, and those who don’t.
byte
Please decode what byte is in the signal above
Written as 0x..... We need characters for the numbers 10-15: A=10, B=11, C=12, D=13, E=14, F=15. Both uppercase and lowercase used
Two hex-digits are one byte (16 * 16 = 256)
Written as 0o..... or 0....
Only used in specific contexts
Uses symbols 0-9a-zA-Z (10 + 26 + 26 = 62) and two extra (often + and \).
Used to display binary data in “printable symbols”
UK until 1971: 4 farthing = 1 penny, 12 pence = 1 shilling, 20 shilling = £1
Note that base 12 and base 60 nice for mathematical reasons.
Generally things will have a prefix:
0b...: binary (eg. 0b10101010)0o....: octal. Sometimes also (annoyingly) just 0.... So 0123 !== 123 (0123 == 0o123 == ...)0x... hexadecimal. Often written in groups of two: 0xFF A0 01 00 45 76 7E 0E to represent bytes.Sometimes, you can only know from context:
chmod always expect octal numbers)rgb(127, 127, 127) == #7F7F7F. So #111111 !== rgb(11, 11, 11)(like audio equipment / signal processors)
stdin: standard input (to a program). Connected to terminal (what you type) by default, if not piped in from somewhere elsestdout: standard output (from a program). By default print to terminal.stderr: standard error (from a program). Here the program will send warnings and error messages. By default connected to the terminalRedirections
| pipe character, connects the stdout of the previous program to the stdin of the next program.> sends the stdout of the previous program to a file>> appends the stdout of the previous program to a file< uses the content of a file as stdin to the previous programThese commands all work on stdin, but can also take a filename (or multiple)
cat – print contenttr abc def – replace every a with d, every b with e, etc.wc -l – count lineswc -c – count charactersgrep Raven – only show lines with the word “Raven”grep -v Raven – only show lines without the word “Raven”grep -n Raven – -n adds line numbershead -n 10 / tail -n 10 – show first / last 10 linescut -d ' ' -f 2-3 – show words 2 and 3 for each linee.g. cat test.txt | grep hello | tr he ba
For this hands-on work, you need some files. In order to get the files, we will clone a git repo from GitHub:
# make sure you're in a directory where you want to make a subdir "raven"
git clone --branch lesson3 https://github.com/reinhrst/raven
cd ravenNOTE: Even though we use a git repo to download the files, the things we do in this playtime are NOT using git.
In the raven directory there are different versions of the poem by E.A.Poe. For now we focus on original.txt.
raven. Is this what you expected?a in them.bird and fancy in themOne thing we have to talk about is line endings.
For normal text (in English alphabet, no accents or emojis etc) each character is 1 byte:
See the ASCII table for the full table.
There are special characters here as well:
%20, now you know)\n)\r or ^M)Windows and the rest of the world disagree on what should be on the end of a line of text:
\r\n (or 0x0d 0a, or CR LF)\n (0x0a, or LF)\r at the end of a line, but this was changed in 2001, now they use \n)Most editors on both Windows and Linux/Mac can these days deal with both types of line endings, however some cannot.
If you open a file with windows line-endings in an editor without support, every line seems to end in ^M before the newline,
If you open a file with unix line-endings in a (windows) editor without support, all newlines seem to have disappeared (sometimes replaced by some character).
Carriage returns on linux
A file with windows line-endings opened in a mac editor. The ^M is actually a single byte (0x0d, CR).
Unix line endings on notepad
A file with unix line endings opened in notepad (pre-2018; these days notepad does the right thing)
These days most editors can deal with both types of enters, however some may always save files with a certain kind of line ending.
This is a major issue for tools like git; if you open a text file in your editor, change one word, and save it with all line endings changed, diff will think that every line has changed`.
The solution is some magic: By default, on Windows, git converts all line endings to \r\n on checkout (we will get to this term next week, it means taking a version from the repository and presenting it in the working directory), and convert the line endings back to \n on commit.
Tools like diff and patch also use magic, but…
sometimes with magic, things go wrong… 
If you ever find yourself in trouble:
dos2unix filename.txt converts window line endings to unix line endingsunix2dos filename.txt converts unix line endings to windows line endingsThese programs are idempotent; this means that if you run dos2unix on a file that already has unix line endings, it will do nothing (so there is no harm in trying).
You can use the file program to find out if a file has windows line endings:
diff & patchUnderstanding git means understanding diff and patch, which do most of the interesting work.
diff file1 file2 shows the difference between 2 filesdiff --color --unified file1 file2 shows the difference between 2 files in better formatpatch orginal patchfile -o - applies the patchfile (result of diff) to originalNote
--color option – text files cannot have colour.diff --unified --strip-trailing-cr file1 file2 to create a patch file with the correct line endings. Note that this is only for this week, as soon as we start with git it should not be a problem anymore.diff --unified file1 file2 > patchfile.patch and then dos2unix patchfile.patchLet’s use diff so we understand how it works
orginal.txtorginal2.txtFor the modern?.txt files, assume the following. modern.txt is the original file. Then 4 different people made some changes (independent of each other) leading to modern2.txt, modern3.txt, modern4.txt and modern5.txt.
modern.txt and each of modern?.txtmodern.txt to modern2.txt (how can we write stdout to a file?)modern3.txt. Save the result in modern23.txt.modern4.txt to it modern234.txtmodern2345.txt, adding the changes of modern5.txt. Did it work? Why not?