cs50
week 4 & 5
what is char *
this is the pointer adress (in hex base 16)
so if I say char * p
it means computer give me a random adress for the var p which will store a string?
When he says char *
Is he referring to the memory location starting at that hex and up until /0 so it would include everything
in the array from char * up until /0 or is it just an adress because that is different than saying its the
adress+length of array until Nul
diff of Nul vs Null
bitmap first few characters in header file https://docs.cs50.net/2017/fall/notes/5/lecture5.html (the first image)
Is this the actual way it is laid out physically on the stick of ram?
stack- stores local variables and the functions as they are running and they pop off whenever they return and pass control back to the whatever is below it in the stack
heap- global variables
environment vars= ?
he says: Initialized and uninitialized data refers to global variables, how is this? yet he said he puts the globals in the heap, this is not a contradiction?
what is the text area, the ones and zeros of the code? #include <string.h>
void foo(char *bar)
{
char c[12];
memcpy(c, bar, strlen(bar));
}
int main(int argc, char *argv[])
{
foo(argv[1]);
}
what is the diff between main and the other functions in c
int argc
I understand that the user can overflow the buffer by typing a really long string so if its going downwards from the heap he will overflow the stack and can overwrite things,
he says you can cause this to return to itself, so basically hijacking the program control from main and attack the computer. Is there a way to provide a simple example?
@ min 33:40 of https://www.youtube.com/watch?v=eZQBx8YJ6Zs
he says âthe return adress is on the stackâ
what is this return adress? this is the hex value adress of where main is defined?
I understand the stack as used to explain recursive calls, and hence a recursive function that does not have a base case will inevitably cause a stack overflow but I dont know what he means here.
I thought the program control will just flow to the next line under the stack when the top layer returns and pops off. but apparently its not as simple as that. So if anyone would care to elaborate, all help is appreciated.
I have quite a bit of experience in C, but Iâm having a hard time pulling the questions out of the wall of text. You might need to make more than one post if you have several questions. Iâll take a stab at a few of the questions in separate replies below.
Regarding how main() is special. When you compile a C program, there needs to be an âentry pointâ for where the program starts running. The compiler looks for a function called main and compiles code so that it jumps to that function to start, and terminates the program when it exits. This is all part of the âC Runtimeâ that the compiler adds to your program automatically.
Strings and char *: C doesnât have strings. A char * is just a pointer to some address in memory (usually the heap). The compiler has no idea whatâs a string, and never knows the length of the string. Itâs just a pointer to the beginning of the string, and itâs up to every string function to interpret it, which is always âkeep going til you hit nullâ. The various string functions (strlen for example) know to look starting at that address and treat that chunk of memory as a string, stopping at the first null (\0) they find, and itâs up to every string function to honor that convention.
Return address: When one function calls another, letâs say foo() calls bar(), the address of foo gets pushed on the stack, and when bar returns, it pops the return address and jumps to that address. So if bar calls another function blah(), the address of bar gets pushed onto the stack, and so on. Arguments to functions also get pushed onto the stack, but the way itâs done is so that the arguments âgrowâ toward the return address like so:
return-address ... <---grows this way--- ....arguments
If you can fool the program into pushing more argument data than the space allocated, then you âsmashâ the return address, and you can write any data you like into it. Usually this turns into a segmentation fault (or âsegfaultâ) but if youâre clever enough, you can write an address to your own code, such as an exploit, and when the function returns, the program will dutifully jump straight to your exploit code. A simple example doesnât exist outside of really old hacker zine articles (and most of them donât work anymore), but this article on Wikipedia explains the concept further:
Thank you very much you clarified alot of things for me, I was having trouble due to his explanation of explaining the stack as a stack of trays in the caffeteria, I thought to myself, âwell why do they need pointers on the stack to direct back to the return adress if they are on top of each other they would obviously be able to find the caller underneath their current location on the stackâ, and then I realized that the diagrams that david used in the lecture are not meant to be taken literally. the only question I still have remaining is the following, we dont need to open a new thread :
So what you mean to say is that the arguement in the case of the attack program is the argv[1] which is the second command the client enters into the command line. therefore if I extend the bounds defined in the program for this one string I am overflowing what is below it on the stack (the functions that make up the program in this case, and since these functions below also have pointers to where they need to return to, you could overwrite these as well)âŚ
The arguments would be not so much the command line arguments like argv[], but arguments to functions as in foo(bar, baz, xyzzy). The compiler knows how much space each takes, so it allocates a fixed amount on the stack, but you can also get the address of things on the stack, write arbitrary data to it, and until recently, even execute code directly on the stack. If you can overwrite the return address by writing past the bounds of a fixed buffer on the stack, you have a âbuffer overflowâ attack.
The real picture is a lot more complicated, since the first few arguments actually get passed in registers and not the stack, but the idea is still the same. Stack smash attacks have gotten a lot more difficult to do nowdays due to extra security mechanisms like NX (prevents writable memory from being executable) and ASLR (address space layout randomization), but itâs still not impossible for creative hackers.