Character Array Pointers | C/C++

Hello!

I was watching the Pointers in C/C++ course in freeCodeCamp’s YouTube channel, and I came across a snippet that I couldn’t replicate.

TimeStamp

At this point, when the null character at the end of character arrays are being explained, it was said that character arrays are always one byte longer than the required size, because it needs a null character to terminate the array. In the video, assigning a 4-length string literal to a 4-length character array produced a compiler error, and when he tried to print a 4-length array that was filled with characters (no null character as the last element), it produced garbage at the end.

When I tried this myself, I couldn’t replicate the behavior. Not only does the print statement terminate properly even when the character array does not contain a null character at the end, the compiler does not complain when I try to assign a 4-length string to a 4-length character array.

//in course
char name[4] = "John";    //ERROR
//in my compiler
char name[4] = "John";    //NO ERROR

Is the information in the video outdated? Are modern compilers smarter about character arrays without the need for null characters? Any references I can read up on this are appreciated.

Thank you!

The null character is certainly needed. What compiler exactly are you using?

My compiler version: gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0

Compiling as C or C++? You are probably getting lucky that you are overwriting into unallocated memory. It would be worth trying to run with Valgrind.

Compiling as C. Here’s the whole program:

#include<stdio.h>

void main() {
    char name[4] = "John";
    printf("%lu\n", sizeof(name));
    printf("%s\n", name);
}

Yeah, that should error but sometimes GCC compiles things it shouldn’t. I’d try Valgrind.

That is new to me. I’ll read the docs and try it out. Thanks!

It’s the go-to tool for identifying memory issues in C. I use it constantly professionally.

I ran the program through Valgrind’s memcheck tool, and it did not output any errors. Maybe I’m missing some options I should be including, because this is analyzing only the heap?

valgrind -s --log-file=valgrind-out.txt ./a.out

Valgrind Output:

==50688== HEAP SUMMARY:
==50688==     in use at exit: 0 bytes in 0 blocks
==50688==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==50688== 
==50688== All heap blocks were freed -- no leaks are possible
==50688== 
==50688== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Odd. I would have expected an error when printf reaches into unallocated memory.

I tried researching it some more and couldn’t find anything. Going to mark this for later and move on for now. Thanks again!

I just tried this code on repl.it and got the expected result (junk values printed out after “John”), though this was with Clang 7.


Running locally, I see the same results as you with both GCC 9.3 and Clang 10.

This different behaviour in different environments is a phenomenon known as ‘undefined behaviour’. When you do something that violates the C language standard, you get ‘undefined behaviour’ and your program may have different results depending upon your compiler and system.

I’m not sure why Valgrind did not detect this particular undefined behaviour though.


Side note: It is traditional to make your main int main instead of void main and then return 0 upon successful termination of your program.

char name[4] = "John";`

It’s been a long time since I did C, but shouldn’t this be:

char name[] = "John";`

And the compiler just knows to add in the null character? By limiting the string to 4 chars, you may be preventing it from adding the null char at the end.

1 Like

As an example:

int main()
{
    char name1[] = "john";
    printf("%zu %s\n", strlen(name1), name1);
    // 4 john
    
    char name2[4] = "abel";
    printf("%zu %s\n", strlen(name2), name2);
    // 8 abeljohn
    // This will vary depending on what is in memory -
    // it just keeps going until it finds a null char.
    // We limitted it to 4 so it couldn't add the 5th char,
    // the null char. It looks like it is name1 that is in 
    // memory after it. It keeps reading through that until
    // it finds the null char. It doesn't care if it is beyond
    // the initial allotment of memory - it just keeps going
    // until if finds that null char.
    
    char name3[5] = "kate";
    printf("%zu %s\n", strlen(name3), name3);
    // 4 kate
    // This works fine because we allocated enough memroy
    // to have room for the null char.
    
    return 0;
}
1 Like

There can also be some implementation and system differences where some compilers in some situations try to protect you from bad choices. I think this is the annoying part of undefined behaviour - reproducibility is broken.

It’s been a long time since I’ve done C, but I think the only time I specified the length of a string is if I knew that it needed to be able to handle longer strings. In JS, if you declare a string with 10 chars but later set it to 20 chars, no problem - JS will reallocate memory for you on the fly, behind the scenes.

C won’t do that. If you declare a sting with 10 chars and then put a 20 char string in there, it will just overwrite memory that doesn’t belong to that string (very dangerous). In that case, if I know that I will never need more than 20 chars, then I will declare it of that size (regardless of what the initial size is) and then it’s up to me to make sure we don’t try to assign a larger string. If I want the string size to be dynamic, then I need to handle that, allocating and deallocating as I go.

Again, it’s been a while, but that’s how I remember it.

1 Like

Yeah, my general approach in C is to set a MAX_STRING_BUFFER constant size and then use safe string methods.

1 Like

OK, maybe it’s changed, but this was taught to me as expected behavior, at least the part of “reading through memory until you reach a null char…” and the “accidentally overwriting memory outside of what was allocated…” - my understanding is that those are standard C behaviors.

If you mean that the behavior of:

char str[3] = "hey";

being undefined - that may be true. I would expect it to do what I suggested - save the string without the null because you’ve told it to only allocate 3 bytes. As you suggest, there may be some compilers that tell themselves, “I know what he was thinking…” I think that would be weird in a low level language like C where you are managing your memory yourself, but it may be the case.

Yea, it is expected that something will go wrong but how it will go wrong is not defined, if that makes sense. But once you go past the end of allocated memory for the char array, the C standard makes no assertions as to what the compiler must do, so any behaviour in that situation is not defined by the C standard => undefined behaviour.

I think in this case, on some systems/with some compilers the memory given to the program at runtime is zeroed before the program runs (probably a best safety practice sort of thing, this is what I was thinking of with ‘protect you from bad choices’), so the missing null is silent in some cases. The C standard doesn’t specify how the blocks of memory have to be situated relative to each other, so your example will create different results based on the end user’s compiler and system.

Yes, definitely - it stores memory where it wants. I thought the behavior was specified, but the result is going to depend on where C decided to store things. My understanding is that it will always try to read through the memory until it reaches that null char, but where it is going to find that will depend on the system and maybe even that particular run.

So, my understanding is that it is unpredictable from the programmer’s perspective - unless s/he wants to dig into the memory and figure out what is there. In that latter case, it is very predictable. (At this point we may just be quibbling over terminology).

That was my understanding, but it sounds like you have more recent experience, so let’s go with your understanding.