alister 186 days ago [-]
I'm glad to see that setjmp() and longjmp() are still allowed.

I'm just kidding by the way. For those C programmers who haven't encountered these before, it is a powerful way to do a "goto" in C. Powerful in the sense that you can jump anywhere, not limited to the same function. If it's used at all these days, it's used for exception handling.

More info:

LukeShu 186 days ago [-]
The way I see the list is that it isn't meant to be exhaustive, it's meant to be functions that contributors are likely to try to use. A new contributor might not realize that they should be using Git's internal strbuf.h instead of the libc string functions. It's a way to give them feedback on that from their compiler, before they spend more time on the patch and send it to the mailing list.
masklinn 185 days ago [-]
Also seems mostly for functions which look innocent and are likely to be used but are very easy to misuse.

setjmp/longjmp is not in this category, it might be argued to be easy to misuse but doesn't look innocent and is not likely to be used in normal codebases.

rkagerer 186 days ago [-]
Might be nice if the error messages suggested such alternatives instead of just saying "banned"?
grogenaut 186 days ago [-]
I believe the git project welcomes suggestions and discussion via the mailing list, and also accepts patches that like you suggested above.
Lerc 186 days ago [-]
I had quite some fun playing around with it to obfuscate code. I did find that jumping to a different position in the same expression broke on a lot of compilers as soon as optimization was turned on.

Here's a simple program to output a string with the characters in an unusual order.

    #include <stdio.h>
    #include <setjmp.h>

    #define J(x,y) (longjmp(x,y),0)

    int main(int argc, char **argv)
        jmp_buf j[011];
        int x,X=0;
        signed char* i = " eehcef lo u ef'oYurls " +021;


        return 0;
begriffs 186 days ago [-]
> the same expression broke on a lot of compilers as soon as optimization was turned on

C is not a "try it and see" language. If you stray from the spec then things will act differently and break between compilers. Section 7.13.2 specifies that only local variables declared as volatile will have a determinate value after a longjmp.

kazinator 186 days ago [-]
ISO C says that any local variables that are not declared volatile whose values are mutated after the context is saved with setjmp then have indeterminate values when the context is restored with longjmp. So yes, that interacts with optimization. Any variable that is cached in a register may revert to its old value when that register is restored blindly from jmp_buf.
mcherm 186 days ago [-]
You, sir, have a crafty and evil mind. Well done.
valleyer 185 days ago [-]
This was super fun to decipher!

1. The first time through this giant ternary defines 9 different small subroutines. Each subroutine can consider the value of "x" to be their argument. X is used as an index into the jmp_buf (subroutine) array during setup, but after that it's always just equal to 8

2. Subroutines 0, 1, and 2 are special. Their argument is the index of another subroutine to jump to. They mutate the character buffer state and then jump to the specified subroutine

3. Before jumping, subroutine 0 moves back a character, subroutine 1 moves forward a character, and subroutine 2 prints the current character and marks it by setting its high bit. Once a character is marked, it will never be printed again

4. All other subroutines get the current character value as their argument

5. Subroutines 3, 4, 5, 6, 7, and 8 all check, in different ways, whether the current character has been marked: x < 0, x >> 8, x & 0x80, (8 << 5) & x, x & 128, x > 0

6. Execution starts at subroutine 3

So you can annotate the code as:

    // Subroutine 0:
    //   move to next char;
    //   goto x;
    (x=setjmp(j[X++])) ? 
     J(j[x], *++i) :
    // Subroutine 1:
    //   move to prev char;
    //   goto x;
    ((x=setjmp(j[X++])) ? 
     J(x[j], *--i) :
    // Subroutine 2:
    //   print and mark (set high bit on) curr char;
    //   goto x;
    ((x=setjmp(j[X++])) ?
        J(x[j], (putchar(*i), *i+=128)) :
     // Subroutine 4:
     //   if (curr char is marked) {
     //     // goto 1(5);
     //     move to prev char;
     //     goto 5;
     //   } else {
     //     print and mark curr char;
     //     goto 3;
     //   }
     (x=setjmp(j[++X])) ?
        ((x<0) ? J(1[j],5) : J(2[j],3)) :
     // Subroutine 5:
     //   if (curr char is marked) {
     //     // goto 1(6);
     //     move to prev char;
     //     goto 6;
     //   } else {
     //     // goto 1(3);
     //     move to prev char;
     //     goto 3;
     //   }
     (x=setjmp(j[++X])) ?
        ((x >> X) ? J(1[j], 6) : J(1[j], 3)) :
     // Subroutine 6:
     //   if (curr char is marked) {
     //     // goto 1(7);
     //     move to prev char;
     //     goto 7;
     //   } else {
     //     // goto 1(5);
     //     move to prev char;
     //     goto 5;
     //   }
     (x=setjmp(j[++X])) ?
        ((x & 0x80) ? J(1[j], 7) : J(1[j], 5)) :
     // Subroutine 7:
     //   if (curr char is marked) {
     //     // goto 1(4);
     //     move to prev char;
     //     goto 4;
     //   } else {
     //     print and mark curr char;
     //     goto 8;
     //   }
     (x=setjmp(j[++X])) ?
        (((X << 5) & x) ? J(1[j], 4) : J(2[j], X)) :
     // Subroutine 8:
     //   if (curr char is marked) {
     //     // goto 0(8);
     //     move to next char;
     //     goto 8;
     //   } else {
     //     print and mark curr char;
     //     goto 4;
     //   }
     (x=setjmp(j[++X])) ?
        ((x & 128) ? J(0[j], X) : J(2[j], 4)) :
     // Subroutine 3:
     //   if (curr char is not marked) {
     //     print and mark curr char;
     //     goto 6;
     //   } else {
     //     exit;
     //   }
     (setjmp(j[3])>=0) ?
        J(j[2],6) : 0));
    // X = 8
    // goto 3(0);
A more straightforward way of writing this would be

    #define IS_CURR_CHAR_MARKED() (*i < 0)
    #define PRINT_AND_MARK_CURR_CHAR() do { putchar(*i); *i += 128; } while (0)
    int curr_sub = 3;
    for (;;) {
        switch (curr_sub) {
            case 3:
                if (!IS_CURR_CHAR_MARKED()) {
                    curr_sub = 6;
                } else {
                    return 0;
            case 4:
                if (IS_CURR_CHAR_MARKED()) {
                    curr_sub = 5;
                } else {
                    curr_sub = 3;
            case 5:
                if (IS_CURR_CHAR_MARKED()) {
                    curr_sub = 6;
                } else {
                    curr_sub = 3;
            case 6:
                if (IS_CURR_CHAR_MARKED()) {
                    curr_sub = 7;
                } else {
                    curr_sub = 5;
            case 7:
                if (IS_CURR_CHAR_MARKED()) {
                    curr_sub = 4;
                } else {
                    curr_sub = 8;
            case 8:
                if (IS_CURR_CHAR_MARKED()) {
                    curr_sub = 8;
                } else {
                    curr_sub = 4;
Lerc 184 days ago [-]
That's quite impressive work. It becomes a bit clearer if you think of the string as a tape. Then a big case statement with Left,Right,Mark seems rather Turing complete :-)
begriffs 186 days ago [-]
> you can jump anywhere, not limited to the same function

Outside of the current function, yes. Anywhere, no.

Longjmp allows you to return to places you’ve already been and have saved with setjmp. Also attempting to longjmp to a function that has since returned is undefined behavior. Thus longjmp’s usual use case is like exceptions in other languages, going back up the call chain, skipping intermediate functions.

ryandrake 186 days ago [-]
I think reasonable people can debate this one. setjmp()/longjmp() are very useful when working with libpng [1] and jpeglib [2], as you mentioned, as a crude exception handling mechanism. I have seen these functions used in production safely for that purpose. I can imagine other horrible uses for them though. Unlike things like strcpy() which are security-holes-by-design, setjmp()/longjmp() should be in a "carefully code review the exact way these are used" list rather than a banned list.

1: 2:

slovenlyrobot 186 days ago [-]
The trouble is how they compose with the expectations of code you don't write. It's just not normal in C to expect execution of your function to abort halfway through, and so memory leaks and broken state are almost guaranteed when they're mixed with the wrong third party library.

It might work great when you wrote it, but it might not even survive the next change to the codebase, and 10 years out, who is to know what programmers will come along and plug new stuff in without noticing the stack magic, etc

edit: another aspect of the same issue is that it can be difficult or impossible to write generic longjmp-safe code for many kinds of tasks. In C++ you have stack unwinding to e.g. delete large temporary heap allocations during a complex operation, but no such facility exists in C.

megous 186 days ago [-]
longjmp will not interrupt third party code (well maybe if it uses callbacks and you use it from a callback).

setjmp/longjmp will almost certainly require tracking any resource use carefully so that resources can be released at the setjmp point, when execution gets back there.

I mostly use it for terminating the recursive descent parsing on error. Otherwise it gets tedious to check return values everywhere in hand written code.

186 days ago [-]
smcameron 186 days ago [-]
> very useful when working with libpng

There is no reason libpng needed to use setjmp as part of its API, and that it does so is (I hope) widely regarded as a bad decision. It's "useful" only in so much as that is the API it defines, but there is no reason that this had to be the case, and simply returning an error code would have been very very much preferable.

caspper69 185 days ago [-]
Didn't glibc use setjmp()/longjmp() for task switching eons ago (maybe even pre-Linux)?
slovenlyrobot 186 days ago [-]
These are used /everywhere/ in PostgreSQL, exactly for exception handling. The result isn't bad at all, but indeed, too powerful a tool for 99% of developers
CamperBob2 186 days ago [-]
Fortunately, 99% of developers, like 99% of drivers, are better than average.
cmroanirgo 186 days ago [-]
> 99% ... are better than average.

I'm thinking there's a flaw in your mathematics...

elgenie 186 days ago [-]
99% of the people on Earth have a higher than average number of legs.
saalweachter 185 days ago [-]
And a below-average number of heads.
mrbungie 186 days ago [-]
No flaw with math/stats here, that can actually happen. I think you're confusing average/mean with median (50-percentile).

PS: I'm not sure where he's getting his numbers though

sooheon 186 days ago [-]
I'm 99% sure that was tongue in cheek.
jraph 186 days ago [-]
I think that it would have been even funnier to say that 99% of developers and 99% of drivers are better than the median, but I only figured this thanks to your parent's comment.
aiCeivi9 185 days ago [-]

> 70% of respondents say they are above average

dragonwriter 185 days ago [-]
Depends which average it refers to any of the means, or the media, or the mode can be “average” (most typically, though, it means the arithmetic mean), and it's only impossible in terms of the median.
LorenPechtel 185 days ago [-]
Exactly. Anything where you have a main clump with a long tail will show this pattern--most samples will be better than the mean.
Stratoscope 186 days ago [-]
You've never been on a project that was 80% done and 80% to go?
ConceptJunkie 185 days ago [-]
And after the second 80% is finished, then you only have 80% left.
Merg 185 days ago [-]
Not native speaker here, but I think, he just meant that, most programmers and drivers thinks they are better in respective things then, they really are.

It is just like letting programmer build a house. It is just a really bad idea.

fanf2 186 days ago [-]
And Lua’s exceptions are somewhat odd because they are based on setjmp/longjmp.
masklinn 185 days ago [-]
Apparently they're also used for error handling in libpng. That's one of the reasons the libspng author started on their project.
crankylinuxuser 186 days ago [-]
And that's the perfect usage for goto. Error catching and jump to fail block...
johannes1234321 186 days ago [-]
Goto requires the fail block to be in the same frame. It can be a working error strategy for failing critical errors in such a way from a failing module. You have to be aware that lots of stuff is prone to leak (which can be avoided by i.e. using some memory arena) and locks/mutexes/... might be in unclear state, so code has to be aware of that.
userbinator 186 days ago [-]
The destination of the goto is usually the clean-up code.
johannes1234321 186 days ago [-]
That's nice for the single frame.

Let's take the example of PostgreSQL, which was mentioned in this thread: A potential implementation (I have no knowledge of pgsql internals) might be that for a single user session all memory is from a single arena, all locks and other handles are tied to the session handle, then it can be an efficient strategy to longjmp out on a critical error (i.e. IO error) and clean the arena as well as things tied to the session. instead of bubbling this up through all frames.

Mikhail_Edoshin 185 days ago [-]
It's a different use case. There was a nice article[1] about exceptions in plain C with setjmp and longjmp.


IshKebab 186 days ago [-]
You're in the 1% though right?

Funny way of saying "too error-prone for anyone sane to consider using".

186 days ago [-]
kijiki 186 days ago [-]
It is also the only way for an xlib program to survive losing its connection to the X server. xlib will let you register a callback to handle loss of connection with XSetIOErrorHandler, but unconditionally calls exit() after your callback returns. Which is extremely anti-social behavior for a library, but that is what it does.

So the solution is to setjmp before each call to an X function that might notice the connection is closed, and longjmp out of the error callback. Ugly, but it works.

These days, you should really just use XCB, or even Wayland.

LorenPechtel 185 days ago [-]
That's nuke-from-orbit level stupidity!
mlyle 186 days ago [-]
You can build coroutines from them, too, which is really cool and useful.
thechao 186 days ago [-]
I did! And the ergonomics are dreadful!

yongjik 186 days ago [-]
I don't know why I still remember these, but if you need to build coroutines in C, ucontext.h might be a better tool:

anfilt 186 days ago [-]
Those are deprecated.
thomasjames 186 days ago [-]
I had only seen them in the context of userspace threading. I wonder what mainstream libs use them. Do you know of any?
fucking_tragedy 186 days ago [-]
Generators are a subset of coroutines, and coroutines are useful in lexing and parsing.
thomasjames 185 days ago [-]
Whoops, I meant that use set/longjmp() to implement coroutines. My wording in retrospect was super unclear. Sorry!
vasama 186 days ago [-]
I think you mean fibers, and no you can't.
tlb 186 days ago [-]
Here is an implementation of coroutines on top of setjmp/longjmp:
superpermutat0r 186 days ago [-]
How does it work for arbitrary arguments? What about function pointers. What about calling a coroutine inside a coroutine?

It also has a stack managed in runtime. I'm not sure that would work for any other application than the fun example here.

vasama 186 days ago [-]
Looks more like an implementation of undefined behaviour to me.
dang 186 days ago [-]
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

monocasa 186 days ago [-]
The underlying implementations define the behavior here very well.
fanf2 186 days ago [-]
Well, I would not go that far. My coroutine hack relies on “unwarranted chumminess with the implementation” ( and there are lots of incidental things that can break it. It is best treated as an example of the principles of thread switching and stack allocation.
craftinator 186 days ago [-]
Don't be an ass you 'fiber' cretin.
186 days ago [-]
craftinator 186 days ago [-]
Don't be an ass you 'fiber' cretin.
zzo38computer 186 days ago [-]
I don't use setjmp() and longjmp() often, but sometimes I do; I use goto more often than those (but still not all the time). (I want goto in JavaScript too. I figured out a algorithm to do so, but have not implemented it.)

One use of setjmp/longjmp I have used is in ZORKMID, to deal with the debugger. At the beginning of the execute() function I have:

(The semicolon is correct; the loop body is supposed to be empty.) Normally, if you enter the debugger and then you exit the debugger to continue the execution, then it will continue from where it left off, which is likely in the middle of the execution of some instruction, and may result the same error again. But, if you change the program counter before continuing execution, then the debugger will keep track of that and will use longjmp instead when continuing execution, therefore skipping the rest of the instruction that stopped.
pault 186 days ago [-]
What can you accomplish with goto that you can't accomplish with try/catch and custom errors?
hibbelig 185 days ago [-]
> What can you accomplish with goto that you can't accomplish with try/catch and custom errors?

You can program in C. C doesn't have try/catch...

dtech 186 days ago [-]
Goto sounds like a generally bad idea in a more expressive language like Javascript. Can you give an example of some Javascript-with-goto code that would be better than vanilla Javascript?
zzo38computer 186 days ago [-]
I have sometimes encountered such thing (but I do not remember right now), but what I do know is that goto is more helpful in C than in JavaScript. That doesn't mean that goto isn't worth anything in JavaScript, but only that there are less circumstances where it is useful compared to the other stuff. (The same is true of macros.)
lcuff 186 days ago [-]
I remember setjmp() and longjmp() being the only way to do a STREAMS device driver I wrote 30 years ago. Don't know if things have changed since then. Ah, what memories.
CorvusCrypto 186 days ago [-]
Could be wrong but I believe s lot of coroutine implementations use this as well since the call preserves the frame info in the result.
gnulinux 185 days ago [-]
I had really fun implementing coroutines, (almost-zero cost) exceptions, iterators etc with setjmp and longjmp. I think they're too useful to be banned outright (especially since C has no view on stack) since there are cases where you need to either use them or inline assembly, but I agree that they're the evilest of evilest std C functions.
chris_wot 186 days ago [-]
I've always wondered why goto is actually considered harmful. I really should read the original paper by Djikstra...
GrumpyYoungMan 186 days ago [-]
Read David Tribble's "Go To Statement Considered Harmful: A Retrospective" instead: . It contains the entire text of Dijkstra's paper and goes over its meaning in a modern context on a paragraph-by-paragraph basis. It's very well written and makes clear the "encourages spaghetti code" argument, while true at the time, applied in a era where basic control flow constructs that we now take for granted did not exist in commonly used languages.
nybble41 185 days ago [-]
Thank you for the recommendation; that was a very interesting read. I find that I can't quite agree with the last "nontrivial goto" example, though. This version without explicit gotos is actually shorter, and IMHO not really any more difficult to follow:

    int parse(void)
      Token tok;
      while ((tok = gettoken()) != END)
        while (!shift(tok))
          if (!reduce(tok))
            return ERROR;
      return ACCEPT;
I will say that any language without goto, or where the use of goto is discouraged, should at least provide guaranteed tail-call elimination as an alternative. In this case the gotos could easily be converted into loops, but not every algorithm is so accommodating.
gbacon 185 days ago [-]
See also Structured programming with go to statements[0] by Donald E. Knuth.


saagarjha 186 days ago [-]
It encourages spaghetti code.
Someone1234 186 days ago [-]
Not using it can also create spaghetti code. For example the "alternatives" often suggesting for exiting a multi-tiered loop (which often add additional boilerplate, additional variables, etc that all need to be maintained/bug free).
FartyMcFarter 186 days ago [-]
Indeed. Despite not having actual goto, Java went out of their way to have labeled loops and the "break label_name" syntax in order to avoid this awkwardness.
jjuhl 186 days ago [-]
smcameron 186 days ago [-]
libpng uses setjmp and longjump as part of its freakin' API.
fortran77 186 days ago [-]
There really are use-cases for it, especially when implementing a Forth-style interpreter in C.
jlg23 186 days ago [-]
> If it's used at all these days, it's used for exception handling.

And in all of the CPS code.

Throw_Away_4384 186 days ago [-]
> If it's used at all these days, it's used for exception handling.

What about co-routines?

f00zz 186 days ago [-]
Yeah, I remember some green threads libraries based on longjmp/setjmp.
saagarjha 186 days ago [-]
I see zero use of those in the codebase, so perhaps they should be banned too. Or maybe they’re not on this list because nobody was using them already?
susam 186 days ago [-]
I have worked in large code bases (RSA NetWitness, Oracle Database, etc.) where setjmp/longjmp were used to create macros for exception handling using TRY-CATCH-ENDTRY blocks in C.
saagarjha 186 days ago [-]
Personally, I feel like wrapping exception handlers in macros makes it easy for people to do unsafe things with it like dynamically allocating memory or calling system functions that can’t handle it correctly.
peff 186 days ago [-]
I see a lot of comments to the effect of "shouldn't XYZ also be banned". The answer is that we're not necessarily trying to be exhaustive. The point is to flag common errors before we even hit review, so we add new functions mostly when somebody tries to misuse them. I don't recall anybody trying to abuse longjmp() in Git's codebase yet (and no, that's not a challenge).
Cogito 186 days ago [-]
Peff, for the people asking in the thread, is there a place where correct alternatives are suggested or demonstrated?

I know there are a few different places that talk about how to use git's internal machinery, but not sure if any are specific to these banned functions.

masklinn 185 days ago [-]
The original commits mentions git's strbuf API[0] and its xsnprintf, a variant of snprintf which asserts that the destination buffer was big enough[1] (rather than just return truncation information).

For other codebases, snprintf is the usual recommendation, and careful straight buffer manipulation (mem*) iff performances are a concern.



peff 185 days ago [-]
The commit messages that add them to banned.h discuss alternatives, though most of the explanations are Git-specific and assume you'll look elsewhere to figure out how to actually use those alternatives.
stiff 186 days ago [-]
adamnemecek 186 days ago [-]
Like essentially all C programs.
segfaultbuserr 186 days ago [-]
Every C programmer has his/her own standard library...
r3trohack3r 186 days ago [-]
I wonder about this every time I read C.

There is talk about npm/rust having massive dependency trees (and they do) because it makes taking on dependencies too easy. But I feel like C is on the opposite side of the spectrum, where managing dependencies is difficult so every C code base is rolling it’s own version of everything.

C also has an expansive standard library but it hasn’t offset re-invention of that stdlib in the ecosystem. I just see those implementations trapped inside code bases that don’t export them in a consumable way for other projects...

Personally I love writing C but rarely do because dependency management makes innovation in those code bases so time consuming for me.

jancsika 186 days ago [-]
It's bad that C devs have rolled their own awful, buggy custom solutions.

However, here's the upshot-- that awful, buggy, custom solution is guaranteed to remain compatible with the codebase that contains it. You won't find a single instance where such a dev "improved", "refactored", "optimized", or "modernized" that awful, buggy code in a way that stopped the rest of the code from working and then shipped that broken blob of junk. And if you did find that, there's no way they'd convince the rest of the devs that this is a useful thing to do in the interest of "moving forward" or whatever.

On the other hand you'll find zillions of dollars spent on systems to keep dependency management tools from causing exactly that problem.

Just to give a real-world example-- a user reported that an abandoned C++ plugin wasn't working. We did a race:

1. Three devs try to get a single C++ dependency of that plugin to work cross platform (OSX, Windows, Linux).

2. I tried to port the C++ plugin to a C plugin with no deps.

By the time I ported, tested, and shipped, those three devs were tracking down a bug with the build script of the C++ dependency on Windows.

Edit: granted the plugin itself is only about 2000 lines of code.

ShinTakuya 186 days ago [-]
Regarding remaining compatible, I would argue that this is why modern languages implement lock files and version pinning. If you don't like a particular change but need some extra functionality or a security fix you can fork the relevant libraries or extend the functionality with an extra self written library.
riquito 186 days ago [-]
It's not a guarantee that it will work, it's not that rare to encounter a minor/patch version in a library that introduced some subtle incompatible change (usually unbeknownst to the author)
ShinTakuya 186 days ago [-]
That's literally the purpose of the lock file. The lock file locks the entire dependency tree. So unless you're bumping versions or you fail to save the lock file, the entire dependency tree's versions will remain the same.

>some subtle incompatible change

In statically typed languages this normally isn't an issue. Of course I'm aware that logic can also be changed, but in that case it's up to you to write appropriate tests (or just don't bump the versions of your libraries without a good reason).

Volt 186 days ago [-]
It still sucks. I have a few Haskell projects from a few years ago that I wanted to compile on another system. So I froze the dependencies, moved the project, and tried to build. Solver failed. I gave up.
ShinTakuya 186 days ago [-]
As far as I'm aware, Haskell's dependency system doesn't have a lock file. That's the key part to keeping things stable. Version pinning alone isn't enough.
rubber_duck 186 days ago [-]
You had the unfair advantage of working on your own :)
adamnemecek 186 days ago [-]
This is a C++ dependency manager problem though.
Volt 186 days ago [-]
That's the point
oblio 185 days ago [-]
From a modern package manager perspective, C++ is almost as archaic as C.
needs 186 days ago [-]
That was very true for a long time. But now with cmake and a package manager like hunter, it became quite easy and pleasant to work with dependencies. ExternalProject_Add() may also work.
segfaultbuserr 186 days ago [-]
Creating a lightweight library would make C programming much more productive, and I'm sure many have tried, but I wonder why none of them see widespread applications, instead of writing some half-baked, ad-hoc helper functions.

One reason, I guess, is the diverse range of applications of C, another reason is the lack of advanced features like generics and templates.

But it would still be useful to create a simple library, intended for Unix-like platforms (beyond the BSD extension of stdlib). So it's more of a cultural problem?

needs 186 days ago [-]
If I had to guess why there are no such lightweight library, it is because when somehow the library API doesn't provide what you want, you cannot hack the library itself easily. If you import the source code you may have trouble building it, and maintening such "patch" is not a lot of fun.
segfaultbuserr 186 days ago [-]
Yes. I guess the fact C doesn't have advanced features like generics and templates probably have made the problem worse, making it difficult to create a "drop-in" API.
goatinaboat 186 days ago [-]
There is a powerful C library that covers nearly all use cases, you merely need to write a configuration file for it. Perhaps you’ve already heard of it? Tcl.
StreamBright 186 days ago [-]
All of them are null pointer exception & segfault free I bet.
sfkdjf9j3j 186 days ago [-]
Due to the language's design, C programs are in fact guaranteed to be free of exceptions.
teddyh 186 days ago [-]
Except SIGFPE.
segfaultbuserr 186 days ago [-]
I nominate your comment for the HN comment of the day.
StreamBright 186 days ago [-]
Nitpicking but you are right. Null pointer bugs (null pointer dereference to be precise) are all over the place in C though.

186 days ago [-]
zsrxx 186 days ago [-]
What's a good library for this kind of boilerplate? A lightweight one if possible, i.e. not fucking glib
vmsp 186 days ago [-]
Salvatore Sanfilippo, from Redis fame, has a nice one
chucksmash 186 days ago [-]
Thanks for linking this, it's always good to see how very strong engineers solve problems. Never in a million years would this approach have occurred to me. For the interested, the README in the linked repo is done well and clearly explains the approach itself as well as pros and cons. This also reminded me that I've been meaning to watch antirez' "Writing System Software"[0] videos on YouTube for quite a while now, so thanks for that too.


megous 186 days ago [-]
I'd rather make it explicit that returned/passed value is something other than a regular zero terminated string. Otherwise you can easilly make a mess, if you pass zero terminated string to a function that expects some special bytes in the memory before the pointer.
doboyy 186 days ago [-]
I do this, but I like to just return the actual struct and also avoid typedefing pointer semantics. Can double as a dynamic array too. It's true. We all have string libraries...
zsrxx 186 days ago [-]
I haven't looked much so I can't comment on the quality of the library but that hairdo is glorious. Props to Salvatore!
tlb 186 days ago [-]
There are far too many. APR is a solid choice -- very portable, well-tested, and fast. It uses memory pools a lot, which work well for many styles but not everything.


emmelaich 186 days ago [-]
seamyb88 186 days ago [-]
What's wrong with glib, other than not being lightweight? I love glib.
CameronNemo 186 days ago [-]
libbsd has strlcat and strlcpy.

or just use snprintf everywhere.

adamnemecek 186 days ago [-]
Rust. Half-joking.
commandlinefan 185 days ago [-]
That was my first thought. No strncpy? Ok,

  while (*s1 && n--)  {
    *s2++ = *s1++;
  *s2 = '\0';
it is, then.
fwsgonzo 185 days ago [-]
I don't think strncpy is the function you are looking for, either. Its for zeroing fixed-size character buffers. And it doesn't indicate problems.

jasonhansel 186 days ago [-]
Nice. Is there a similar standalone C library for safe/sane string handling? (Please don't tell me to use C++.)
ayosec 186 days ago [-]
dblock 186 days ago [-]
Windows programmers are familiar with StrSafe.h, and which go a lot further.

Also found on Github with a better list.

WalterGR 186 days ago [-]
IIRC, StrSafe.h - and much more - came from the development of the Microsoft Security Development Lifecycle (SDL).

Code Red was the wake-up call for Microsoft and in February 2002, based on a memo from Bill Gates that first coined the phrase "Trustworthy Computing," Microsoft shutdown Windows development for the first time ever to get a handle on the security issues the products were facing.

pjmlp 186 days ago [-]
Additionally we get to enjoy bounds checked arrays(std::...), and iterators on debug builds, with possibility to selectively enable them in release mode.

While Windows by all means still has its security issues, the toolchain is much more security oriented than most FOSS alternatives thanks to the Windows XP wake up call.

Android and ChromeOS are probably the mostly locked down alternatives on the FOSS space.

uncletaco 186 days ago [-]
Tangential but the fact that I had to open up the machine and remove a screw to completely replace ChromeOS with linux bothers the fuck out of me.
floatboth 186 days ago [-]
That's how secure boot should work. Replacing the root of trust should require serious physical access that can be tamper-evident. And yeah, out of the box, the trust is with the vendor — who else would be trusted in a device that doesn't have an owner yet?
uncletaco 186 days ago [-]
I never thought of that. And it makes a lot more sense in that context.
pjmlp 186 days ago [-]
Well, you can see how they are locking it down with containers sandboxing from an IO talk, by making use of gVisor and a Rust based containers.

"Linux for Chromebooks: Secure Development"

saagarjha 186 days ago [-]
My Chromebook just made me enable developer mode.
toast0 186 days ago [-]
Developer mode lets you boot another OS, but leaves you open to accidentally wiping your drive if you hit the wrong button on boot. Replacing the firmware is a good idea, you do need to remove the write protect screw, but that really makes sense for some note of physical security.
saagarjha 185 days ago [-]
Multiple wrong buttons in a specific sequence, at least on my Chromebook.
temac 186 days ago [-]
I don't know, you can use valgrind and fsanitize and ibstdc++ __gnu_debug:: containers or _LIBCPP_DEBUG under libc++.

Just know your tools...

pjmlp 185 days ago [-]
The big difference is that on MSVC++ those debugging features are enabled by default on debug builds.

File->New C++ Project, already there.

Available to everyone, regardless of their compiler switch mastery level.

Defaults matter.

yesmar 186 days ago [-]
You can find an up-to-date, authoritative version of banned.h here:
slovenlyrobot 186 days ago [-]
That's a surprisingly small list, missing e.g. sscanf / gets / strtok / all the other "usual suspects" at least
asveikau 186 days ago [-]
I don't see a huge problem with strtok as long as it isn't a multi threaded program. It is not in gets territory which is literally impossible to use correctly.
bhk 186 days ago [-]
Synchronous reentry is a problem. Say some loop using strtok calls a function... how do you know that that function doesn't use strtok (or call another function that does...).

Any use of non-const static variables in general has this problem, and strtok is just one example.

derefr 186 days ago [-]
> Say some loop using strtok calls a function... how do you know that that function doesn't use strtok (or call another function that does...).

I’m constantly surprised that C isn’t specified in such a way that lack-of-reentrancy can be determined at compile time. You’d “just” need a symbol table, sort of like the debug symbol table, in each compiled object, holding for each visible symbol the set of function calls marked `__no_reenter` that are predominated by that symbol in control flow. (Yes, some functions do computed jumps, like with longjmp. Just implicitly label those functions __no_reenter unless the programmer explicitly labels them __reenter!)

asveikau 186 days ago [-]
The problem isn't exactly reentry. It's that the function, having already exited, has modified global state, and it needs the same state to be there on the next call. If another call is interleaved (another callee with its own strtok loop) then the state is overwritten with the inner loop, and your own next call won't have its expected contents.

Whereas with a term like reentry I think of two strtok frames on the stack at the same time, which is not possible.

I think it's a bit easier to conceptualize with a function that produces a simple return value in a static buffer. Take getpwnam(). It fills a global structure with info about a given user. So you may keep a pointer to that on your stack. Then you call some other function foo(). foo looks up another user. Suddenly you can't rely on that other call to getpwnam() not having overwritten the data in the result you got back the first time.

bhk 186 days ago [-]
Ah, yes. What I described wasn't reentry. A reentrant function using strtok could exhibit this problem, but it applies more widely than that.
userbinator 186 days ago [-]
how do you know that that function doesn't use strtok (or call another function that does...).

The same way you know a lot of other things about the codebase: By reading and understanding it. Given what strtok() is used for, it's almost always going to be working on a single context at any one time.

bhk 186 days ago [-]
First, many C projects are libraries that are written for reuse in other programs. So "the codebase" is an arbitrary, unknowable thing. We don't have the ability to look at all the other code that will eventually coexist in the same process.

Second, code changes over time. Assuming you inspect 100 lines of code to ensure that they don't call strtok() ... do you sprinkle comments all over as a warning to future programmers not to add any calls to strtok()?

Finally, assuming we could evaluate all of the code in a program at once, it is impractical to build non-trivial programs that way. Instead, we rely on contracts. The idea is that if a caller conforms to its end of the contract, the callee will conform to its end. This allows us to reason about correctness of some code without reading all the other code in the universe and thinking through all the potential code paths. Any function using strtok() would be presenting a rather ugly contract clause "this function destroys strtok's static state". Even worse, such a clause would be contagious, infecting any client of that function, and so on.

missblit 186 days ago [-]
Might as well just use strtok_r everywhere and not worry about if you're using strok safely or not.
lholden 186 days ago [-]
"I don't see a huge problem with insert thing here as long as it isn't a multi threaded program."

Usually a bad omen. :)

slovenlyrobot 186 days ago [-]
I can't remember what's wrong with it, but it's responsible for about eleventy billion CVEs over the years
tropo 185 days ago [-]
You can safely use gets on trusted input. For example, your program might create a pipe or socketpair to communicate between the processes that result from a call to fork. The stdout of one process feeds into the stdout of the other process.

Unless you have an insane gets implementation that writes the bytes backwards, you can also safely use it on untrusted input. Simply place a guard page at the end of the buffer. You can do the call from a forked child that you let die, taking notice of the problem in the parent. On systems without threading (because of locks in libc) you can do a longjmp to recover.

tlb 186 days ago [-]
git does use multiple threads for some longer operations, like gc and pack, so using strtok would be a disaster.
Kaveren 186 days ago [-]
Maybe gets doesn't need to be banned because it got removed in C11 and at least gcc issues a warning on use.
ianai 186 days ago [-]
I hope it grows through community effort. This is a good sanity check sort of thing.
186 days ago [-]
jpjuni0r 186 days ago [-]
I'm curious how sscanf is considered bad?
saagarjha 186 days ago [-]
scanf(“%s”, …) can cause a buffer overflow.
mort96 186 days ago [-]
It sucks that there's no scanf("%.*s"), where you give the buffer length as a separate argument. The fact that it's in a string literal also makes using a compile-time constant really ugly.
claystu 186 days ago [-]
You can overflow buffers all sorts of ways in C.
jcl 186 days ago [-]
Looks like the second vsprintf definition should be "BANNED(vsprintf)", not "BANNED(sprintf)". Not that it matters much, as it only affects the error message.
peff 186 days ago [-]
Yes, you're right. Patches welcome. We do our development on a mailing list; see for details.

However, if you're more comfortable using GitHub PRs, there's a gateway interface at

ttaylorr 186 days ago [-]
I submitted a patch to fix this typo from my fork (which you can see at
gus_massa 186 days ago [-]
Nice catch! You can send a "PR", but note that they have their own custom "PR" procedure, not the standard "PR" procedure in GitHub.

> Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow procedure for any of your improvements.

LukeShu 186 days ago [-]
Their "custom" procedure is the `git send-email` command, which has a lot more of a claim to be "standard" than GitHub's custom procedure.
saagarjha 186 days ago [-]
Aside: GitHub’s refusal to provide a feature to close pull requests for mirrors is utterly ridiculous since it requires that people (or bots) go around and disable pull requests and point people to the patch submission process.
OJFord 186 days ago [-]
What's weirdest to me is that it differs from the policy on Issues, which can be turned off in settings.

I sort of get not allowing it, except that it's inconsistent.

yjftsjthsd-h 186 days ago [-]
> not the standard "PR" procedure in GitHub

If it is specific to one implementation, then it is not standard, pretty much by definition.

notatoad 186 days ago [-]
the word "standard" has more than one meaning. it's not standard by the definition of standard that you chose, but you can use your context cues to realize that it's not the definition of standard that OP meant.
zeckalpha 186 days ago [-]
I’m sure they’d welcome a patch.
bigtrakzapzap 185 days ago [-]
These shouldn't be used either (_s are available in C11, _l tend to be locale-parameter thread-safety, and _r have sizes/state for bounds/thread-safety):

- strtok -> strtok_r / strtok_s

- asctime -> strtok_r / strtok_s

- gmtime -> gmtime_r / gmtime_s

- localtime -> localtime_r / localtime_s

- ctime -> ctime_r / ctime_s

- dirname -> dirname_r

- basename -> basename_r

- devname -> devname_r

- readdir -> readdir_r

- ttyname -> ttyname_r

- gamma -> gamma_r

- lgamma -> lgamma_r

- lgammaf -> lgammaf_r

- lgammal -> lgammal_r

- atoi -> atoi_l

- atof -> atof_l

- atol -> atol_l

- atoll -> atoll_l

- gets -> gets_s

- scanf -> scanf_l / scanf_s

- fscanf -> fscanf_l / fscanf_s

- sscanf -> sscanf_l / sscanf_s

- tmpfile -> tmpfile_s

- fopen -> fopen_s

- getenv -> getenv_s

- strdup -> strndup

- strcmp -> strncmp

- strlen -> strnlen

- (Multibyte/wide conversion functions without mbstate_t parameter)

- wcslen / wscnlen -> wcsnlen_s

- wcsncasecmp / wcscasecmp_l / wcsncasecmp -> wcsncasecmp_l

- strcasecmp / strcasecmp_l / strncasecmp -> strncasecmp_l

- bzero (use explicit_bzero, in some cases)

- calloc, realloc -> reallocarray (for arrays of non-byte items)

- memmove -> memmove_s

- strncat -> strncat_s

- strncpy -> strncpy_s

- srand / rand -> rand_r

There are others that are platform-specific. Thread-safety, internal mutable state (not thread-safe), and buffer-overflows are the primary concerns that aren't necessarily applicable in all situations.

masklinn 185 days ago [-]
Annex K was never very popular, is still very easy to misuse[0] and has pretty much been deprecated and slated for removal.


damien_brooklyn 185 days ago [-]
Back in my individual contributor days I got into serious trouble because of this. Where can I find it has officially been deprecated? I tried and I can still use it without getting any warnings?
imrehg 186 days ago [-]
Interesting to see the starting commit for that file, there's quite a bit of discussion and rationale too:
codedokode 186 days ago [-]
What surprises me in C developers is that C exists for probably 40 years but they still don't have proper strings (not just pointers). In many cases there is no large performance penalty for storing string length, and checking it, but they still use pointers or a separate pair of variables for pointer and buffer size instead of single object.
pjmlp 185 days ago [-]
Because they have a security culture that you only get errors due to "holding it wrong" or "every good programmer does it right", despite evidence on the contrary.

Plus added the fact that early C compilers generated quite lousy code on 8 and 16 bit computers, there is this idea to micro-optimize each line of code as it is being written, without any profiling feedback of it actually matters, rather cargo cult how writting code like X is faster than Y.

For example, outside 3D rendering and audio software processing, I never saw a visible impact (to the end user) of bounds checking.

jstimpfle 185 days ago [-]
> For example, outside 3D rendering and audio software processing, I never saw a visible impact (to the end user) of bounds checking.

Indeed the computation overhead on bounds checking is irrelevant for "cold" code, but consider this

1) Pretty sure you can get bounds checking when compilers detect that you're accessing a static array.

2) Otherwise, it's unclear how to devise a system that integrates bounds checking with C semantics. (Yes, that's unfortunate!)

3) Bounds checking does at least increase code size.

pjmlp 185 days ago [-]
1) Totally optional from ISO C point of view, not all C compilers do do it, usually requires static analysis or specific compiler warnings to be enabled. Which not everyone does.

2) Solaris does it perfectly fine on SPARC thanks to tagged memory (ADI). Which Google in collaboration with ARM will make mandatory on future Android releases as well. [0]

3) It hardly mattered in MS-DOS and Amiga LOB applications developed in across Turbo Basic, Quick Basic, GFA Basic, Turbo Pascal, Clipper, so it matters even less nowadays unless we are speaking about PIC like hardware.

Regarding bounds checking I usually refer to Hoare's turing award speech, back in 1981:

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

Or for that matter, the DoD Multics's B2 security evaluation[1], with several remarks how PL/I made the system safer with its string handling, pointer integrity validation and bounds checking.

As noted, yes there are niche cases where bounds checking does have an impact, but for a large spectrum of code that gets daily written it isn't the case.

[0] -

[1] -

jstimpfle 185 days ago [-]
1) It's not standardized but it's very simple to implement nevertheless.

It is my understanding that 2) doesn't have anything to do with C (so given hardware support, you can have it for free whether working in C or not. I think this kind of invalidates your point).

And also, 2) is not perfect, only probabilistic (a source I found says 94% likelyhood to detect OOB).

And it works only for the most basic situations where you use the system allocator and never subpartition these allocations. So, beyond these simple cases that we can get for free without any involvement from C semantics, I still maintain that it is unclear how to devise a reasonable a useful system to do bounds checking that can be added to the C memory model. We can make up an annotation syntax to cover many of the simpler cases, but these are hardly better than plain assertions (which I regularly use).

I doubt Hoare had a good idea to add general bounds checking on a low-level language like C, otherwise that would be standardized by now.

pjmlp 185 days ago [-]
Pointer validation via hardware memory tagging has everything to do with C, because it is only due to C's shortcommings that millions of research dollars keep being spent to try to make it work.

In what concerns Solaris, and the requirements for future Android with ARM memory tagging, the system allocator is all there is, at least from official support point of view.

Hoare hardly needed to pursue such endevour, because all systems programming languages derived from Algol, like ESPOL, NEWP, PL/I, PL/S, PL/8, BLISS,.... were sane regarding bounds checking.

Hardware validation of memory is the only way to tame C, the alternative is to just dump the language, because as proven by ISO C11 dropping Annex K, very few actually care about making the language safe.

However given UNIX's dependency on C, it is also quite clear to me that in the next couple of decades C will be around, long after I am gone, and business opportunities to create companies on top of CVE exploits due to memory corruption bugs.

jstimpfle 185 days ago [-]
> Pointer validation via hardware memory tagging has everything to do with C, because it is only due to C's shortcommings that millions of research dollars keep being spent to try to make it work.

So, to restate, Solaris does not do it "perfectly fine". Thanks for making my point.

> In what concerns Solaris, and the requirements for future Android with ARM memory tagging, the system allocator is all there is, at least from official support point of view.

That's a pity, because if you're not doing your own allocators then you'll have to accept lock contention, extreme memory overhead (for smaller allocations, say <= 64 bytes), and you'll need to match every little allocation with a deallocation, instead of making e.g. custom pool allocators.

You're just not going to write a large infrastructure (i.e. performance-oriented) system in this way.

zxcvbn4038 186 days ago [-]
There are countless libraries that add higher level string functions and no end to higher level languages. C fills the niche where you want something higher level then assembly but lower level then Perl, ruby, python, etc. Sometimes you want or need to manage your own memory. Arduino is a good contemporary example.
stjohnswarts 186 days ago [-]
Exactly. C philosophy is to use libraries and not put things like a better string library in the core functions of the language. It keeps the language relatively clean and easy to understand, unlike c++
unscaled 185 days ago [-]
So the C philosophy is to provide a bad standard string library that is not thread-safe and makes it impossible to write programs without remote code execution vulnerabilities, rather than have a better library with exactly the same functionality but with more security and safety "bloat"?


Yes, C used have a cavalier approach towards security in the past. But if you're asking why the standard library is not fixed yet, I think that instead of pinning it to some lofty philosophy, it's safer to say that good C developers realized long ago that the original C strings are a mistake. Most big C projects define their own string functions and often their own length-prefixed string types. The C standard committee just gave up on fixing this issue in the standard library, but this is not due to philosophy, but because of the impracticality to force a standard solution on this stage.

185 days ago [-]
pjmlp 185 days ago [-]
Bonus points when the libraries are so incompatible among themselves that require extra conversation steps, and then just get dropped 'cause "mind the performance".
jstimpfle 185 days ago [-]
When compatibility is needed, APIs should consume pointer+length pairs. It doesn't get better than that, in ANY language, in terms of simplicity and modularity.

Libraries like Qt with extreme lock-in are at the other end of the spectrum. If it works for them, that's nice. Doesn't work for me. I don't think using a string library is in the spirit of C programming.

pjmlp 185 days ago [-]
Which means either pointer+length get packed into their own structure to avoid mix up errors, thus requiring conversion function calls across libraries, or they are given separately manually, thus opening the door to the copy-paste mistakes from pointer+bad length that they are supposed to protect against.

Qt is as locked-in as any LPGL 3 FOSS project is.

jstimpfle 185 days ago [-]
> Which means either pointer+length get packed into their own structure to avoid mix up errors, thus requiring conversion function calls across libraries, or they are given separately manually, thus opening the door to the copy-paste mistakes from pointer+bad length that they are supposed to protect against.

No, we were talking about compatibility/modularity. You're shifting the topic.

> Qt is as locked-in as any LPGL 3 FOSS project is.

I'm not speaking about licenses lock-in, but about lock-in from an engineering point of view. Are you aware of any significant Qt projects that don't have "Q" all over their codebase?

(And yes, by contrast to GPL, I believe you can use LGPL libraries without suffering a terrible amount of (license) lock-in)

pjmlp 185 days ago [-]
How am I shifting the topic?

Compatibility/modularity doesn't happen in the air, rather in written code.

So either one passes structs around, and somehow they need to be compatible.

Or one passes pointer + lenght as two separated variables, with the consequences to keep in sync two unrelated variables, from the compiler point of view.

185 days ago [-]
pjmlp 185 days ago [-]
Arduino uses C++.
jbboehr 186 days ago [-]
I've used talloc [1] where appropriate to greatly simplify memory allocation and string handling.


emmelaich 186 days ago [-]
C has _only_ pointers for variable size things, not just strings.

Roughly speaking C vars are either known fixed length, or accessed via pointer. There is nothing else.

(except arrays -- which are mostly pointers)

doboyy 185 days ago [-]
The thing you are pointing to could have its length prefixed. I've always assumed that this isn't the case because nobody could commit to the size of the length prefix. Is it string8, string16, string32, or string64? Using a sentinel value to denote length is less opinionated and more portable.
jstimpfle 185 days ago [-]
And also unbeatable in memory efficiency.
doboyy 185 days ago [-]
Unfortunately, it's often algorithmically less efficient and more error prone, so everybody ends up replicating what git is doing with 'strbuf' but in slightly incompatible ways. The effect makes dealing with strings in C unnecessarily unpleasant.
MuffinFlavored 186 days ago [-]
For anybody else:

Why is strncpy insecure?

> strncpy() doesn't require NUL termination, and is therefore susceptible to a variety of exploits.

wruza 186 days ago [-]
It also fills all remaining bytes (if any) with nuls. ‘N’ in its name is not the same ‘n’ as in snprintf etc. It could be named fldcpy, as it works with 0-padded fixed-width fields rather than 0-terminated strings.
vagab0nd 186 days ago [-]
Yes. Also as mentioned in the post, strlcpy (when it's available) is safer.
jandrese 186 days ago [-]
But still not safe, thanks to the return value. Imagine the case where you are using strlcpy because you can't be sure if the source buffer is properly null terminated.
dezgeg 186 days ago [-]
If your buffer isn't a NUL-terminated, then don't call a function that is only defined for NUL-terminated buffers. It's as simple as that.

I'm baffled by how some people claim strlcpy() is 'broken' or 'not safe' because it doesn't handle non-NUL-terminated inputs; the exact same thing applies to just about any function in the C standard library that takes strings as input. Are functions like strchr(), fopen(), printf(), strstr(), setenv() 'broken' as well?

begriffs 186 days ago [-]
> If your buffer isn't a NUL-terminated, then don't call a function that is only defined for NUL-terminated buffers.

It's not that people want to pass strncpy source buffers that lack NUL termination, it's that strncpy in certain situations will not NUL terminate its results.

> some people claim strlcpy() is 'broken'

Speaking of strlcpy, it thankfully doesn't have the problem that strncpy does. However strlcpy is not in the C standard or in POSIX, so can't be used portably. In C99 snprintf is a better choice.

dezgeg 185 days ago [-]
Yes, strncpy() not NUL-terminating it's output is nasty and well-known. But the comment I was responding to was claiming strlcpy() being unsafe.
jandrese 186 days ago [-]
Yes, because cutting off the trailing NULL is a threat vector every C programmer needs to be conscious of whenever they're dealing with strings.

But it's also a problem because maybe the programmer is using strlcpy to grab the first five lines of a 10TB memory mapped file. If you're not thinking about the implications of that return value it can be a real surprise.

186 days ago [-]
fserb 185 days ago [-]
This file could be made much better by having comments pointing to reasonable alternatives to the banned functions.
noname120 186 days ago [-]
Why is there no brief explanations in this code why each function is banned?
asveikau 186 days ago [-]
A C programmer reading this list knows exactly why they are there. It is not at all a controversial list.

Edit: I guess some down voter doesn't believe me but it continues to be true. They are all string functions. Most of them do not take an output buffer size, so a source string exceeding the destination buffer will overflow. Others, like strncpy, take an output buffer size but will not null terminate when exceeded, so the buffer size must be reduced by 1 by the caller and potentially manually terminates, which is too easy to not consider. But any C programmer with significant experience already knows these pitfalls. So it's not controversial.

magnetic 186 days ago [-]
You explained it in 2 lines that would make for great comments.

It seems useful to educate C programmers who don't have "significant experience" because, well, by definition, not every C programmer has "significant experience" and they are the ones who would benefit most from learning this.

It's not about being controversial: nobody's going to disagree with your explanation (it is what it is - it's not an opinion).

asveikau 186 days ago [-]
I don't remember how I learned this. But it happens pretty naturally working with C a lot and doing things like reading documentation and code. To be clear this is not elitism or something of the sort, I am not at all pessimistic that new people won't learn it. It just needs to happen in its due time.

So I would say... What good is a verbose comment to explain? Somebody who doesn't instantly understand why it's banned can read the big warnings on the manpage, or they can google it and land on good explanations on sites like Stack Overflow, or they can look at the function signatures and come up with a correct guess. Then boom. They know. And from there they can assess other interfaces and see if they suffer the same weakness.

This ability tends to come from simple exposure.

bhk 186 days ago [-]
If every C programmer knew, then there would be no need to ban them
Dylan16807 186 days ago [-]
That's not true. People don't staunchly avoid situations that have pointless sharp edges. Putting up caution tape, despite everyone being aware of the danger, is very often a useful activity.
asveikau 186 days ago [-]
I actually wonder why they do it.

Linkers already warn for this stuff. Doing it this way requires every source file to include banned.h, which I would guess is done by including it from some other common header, but that's not fool proof either.

ncmncm 186 days ago [-]
Not knowing defines you as not a C programmer. That doesn't stop many people from trying to code C anyway...

I have seen strtok used, even though it takes more code to use it correctly than not use it. strlcpy is like that, too. Of course, what happens is nobody uses them correctly.

bhk 186 days ago [-]
ncmncm 184 days ago [-]
Does not apply. Nobody is born with C skills.

Competence can be measured objectively. Failure to understand array bounds is prima facie evidence of lack.

ikeyany 186 days ago [-]
FYI that response insinuates that noname120 is not a C programmer...
MrStonedOne 186 days ago [-]
I'm a c programmer, I'm reading this list, I don't know why they are there.













Merg 185 days ago [-]
Just open the docs. Seriously, first link in Google/DDG/Bing on topic of "<any c string routine> safety" will explain it to you. I am self-taught and I can see it.

Seriously, just looking at function arguments shows you, there is a problem under the hood. Reading docs also will point out.

It is open-source, read and learn from Git, Linux code and so on. If you do not have experience is sysdev or high security, just don't contribute there.

This is like saying, construction engineering is gatekeeping beginners, because you have to work for few years before you can start your own project.

mhh__ 186 days ago [-]
I think you can. The whole point of being open source and using a project like cough git (since we're here), is that if you don't like it you can fork it. If you don't allow forks then I agree you aren't open source
philwelch 186 days ago [-]
Looks like they did.
noname120 186 days ago [-]
That's gatekeeping. If a junior developer starts working on this project they might bother a more senior developer who has better things to do with their time, or inline the definition of one of the functions out of frustration.
philwelch 186 days ago [-]
Git, like the Linux kernel, is essential, basic infrastructure that everybody uses and that needs to remain reliable. It's not a volunteer training project for dilettantes and neophytes. Gatekeeping is a deliberate part of how these projects are run and the results speak for themselves.
noname120 186 days ago [-]
Makes sense, thanks for taking the time to answer.
noname120 186 days ago [-]
I'd be interested in knowing which part of the comment you don't agree with and why.
Someone 186 days ago [-]
chucksmash 186 days ago [-]
Thanks for posting these. My company has a similar list of banned C calls so I was already familiar with the rationale. The initial patch[0] in particular is eye opening though. 157 lines of rationale and exploration of alternatives for 22 lines of uncontroversial code.


ttaylorr 186 days ago [-]
There are. It was mentioned that they are available in the 'commit comments' on 'banned.h', but you can find them by blaming down to the original commit which introduced each line.

Here are some of them:

  - strcpy()'s rationale:

  - strcat()'s rationale:

  - strncat()'s rationale:

  - sprintf()'s rationale:
calibas 186 days ago [-]
I believe they're all vulnerable to overflows and easy to exploit.

saagarjha 186 days ago [-]
> As many already know, strcpy(3) is a very unsafe function call in the C library (you should always use strncpy(3) unless you can be exactly certain about the number of bytes you’re going to copy beforehand).

strncpy pretends to be safe but fails to NULL-terminate and has other performance issues that generally lead me to believe it’s a security placebo. I would recommend using snprintf instead.

chucksmash 186 days ago [-]
Whenever you write, you have to consider your audience.

I'd assume there are no comments explaining this because for the intended audience, it is self-evident. "People browsing Hacker News on a Sunday" was probably not the intended audience :-).

noname120 186 days ago [-]
Just to make it clear. I'm aware of the fact that these functions have security implications. Nonetheless when such decisions are made, I believe it's important to explain them—at least briefly—directly in the code itself.
wruza 186 days ago [-]
With the way they are banned, you see it only as a compiler warning on implicit declaration of function (or at linking stage, if warnings are suppressed; idk what exact build process is). Afair, there is no evidence of banned.h in an error trace. So you have to look it up by yourself anyway.

They could rename them as “sorry X is unsafe and creates too much trouble” maybe, but in today’s internet “X considered harmful” is a common search query that everyone in the field is expected to know.

noname120 186 days ago [-]
Fair point. It seems that more and more projects use "-Werror" which causes the compilation to stop and show you exactly where the error comes from[1]. Second point is that searching for "is_a_banned_function" in the code directly points to this file.


wincent 186 days ago [-]
Because Git's own developers know to use git-blame, git-show, and write good commit messages. Comments are not the only place to store meta-info about why code is the way it is.
MrStonedOne 186 days ago [-]
Git is open source, it doesn't have its own developers, every developer is "gits own developers". They really should make this more explicit.
philwelch 186 days ago [-]
Of course Git has its own developers, just as the Linux kernel does. There's no moral, technical, legal, or organizational obligation for the maintainers of any open source project to accept or solicit changes from randos.
wincent 186 days ago [-]
They probably don't want the source code to start resembling a Literate Programming example.
MrStonedOne 185 days ago [-]
I don't think gatekeeping contributors and gatekeeping contributions are the same thing.
kevin_thibedeau 186 days ago [-]
None are properly length checked. strncpy() is for "binary" strings and can omit the NUL in addition to excessive zero padding large buffers.
wyldfire 186 days ago [-]
Collectively "because they're too easy to misuse (and even if used correctly, complicate audits)."
arcturus17 186 days ago [-]
Read the manual!
ohples 186 days ago [-]
Because it's git. It's obscure by design.
b55c63a4 186 days ago [-]
For context, this header file was introduced during the period of Microsoft's acquisition of GitHub.

Git's banned.h roughly approximates the banned functions according to Microsoft's Security Development Lifecycle:

It seems Microsoft once published their own banned.h, but this file is not readily available from the MSDN anymore.

brianpgordon 186 days ago [-]
What are you saying is the connection between GitHub's acquisition and this header? This code is from git, not GitHub.
OJFord 186 days ago [-]
I'm also confused reading that comment, but I suppose the intended link is that GitHub employees are significant git contributors?
philwelch 186 days ago [-]
Yes, I've seen other instances of Microsoft/GitHub making substantial Git contributions:

Microsoft seemed to have been adopting Git already, so it may be that acquiring Github was merely part of a broader Git strategy that Microsoft has adopted.

westurner 186 days ago [-]
FWIW, here's awesome-static-analysis > Programming Languages > C/C++:

These tools have lists of functions not to use. Most of them — at least the security-focused ones — likely also include: strcpy, strcat, strncpy, strncat, sprints, and vsprintf just like banned.h

forrestthewoods 186 days ago [-]
Neat macro. I like the idea.

Suggestion: the macro should recommend a replacement.


tails4e 186 days ago [-]
Delighted to see strncat there, it's a buffer overflow waiting to happen, especially as the name sounds like it's the 'safer version' where it takes a size argument, except it's not the size of the destination buffer, it's the remaining size, so unless you know this, it's wrong for every non empty buffer. strlcat (not standard) operates how you expect.
WalterBright 186 days ago [-]
Whenever I review code and see strncat there, it's almost guaranteed to be a bug. This is for the simple reason that nobody remembers exactly what strncat does, and so misuse it.
saagarjha 186 days ago [-]
What does Git use instead for copying strings? snprintf?

Edit: also interesting is a search for alloca:

LukeShu 186 days ago [-]
Git has an internal "strbuf" library
masklinn 185 days ago [-]
Alternatively it also uses snprintf, though as a small internal variant (xsnprintf) which literally kills the program if the destination buffer is too small:
kazinator 186 days ago [-]
If that header is ever accidentally included before any standard header, it's undefined behavior. :)
anfilt 186 days ago [-]
How? The preprocessor is ran before every thing else? It will overwrite the defintions in the header just like in the source file? You still would get a linking error. The only reason I couldn think is if the the stdlib headers also called undef for those functions. Moreover this is all during build not runtime.
nybble41 185 days ago [-]
According to C99, section 7.1.3:

    If the program declares or defines an identifier in a context in
    which it is reserved (other than as allowed by 7.1.4), or defines
    a reserved identifier as a macro name, the behavior is undefined.
It doesn't matter what the macro would expand to; simply defining the reserved identifier as a macro triggers undefined behavior. That doesn't mean it won't work, for some specific combination of standard headers, compiler, and program source code. It just isn't a strictly conforming program. A conforming implementation is allowed to flag this as an error, ignore the definition, or simply generate nonsense output.
userbinator 186 days ago [-]
Good to see the list is short and relatively sane compared to other "banned function" lists. Unfortunately, "too easy to misuse" is a slippery slope, and gets(), which is probably the best example of a function which is really broken by design, isn't on that list.

I'm surprised that "complicate audits" is given as a reason, because isn't this something static analysers (and I mean ones that actually analyse data/code flow, not dumb pattern-matchers) can easily detect? It's really just asking the question "how long is this/can this be" and following the data back to its origin(s).

pjmlp 186 days ago [-]
As per ISO C11 standard, compliant compilers no longer need to support gets().
rkangel 186 days ago [-]
C aliasing rules can very quickly make "following the data back to its origin" very difficult.
userbinator 186 days ago [-]
Theoretically it's equivalent to the Halting Problem, but in practice I've not seen such difficulty; if aliasing does become a problem to the extent that following dataflow is difficult, I suspect there are already far deeper design flaws in the codebase.
jasonhansel 186 days ago [-]
If I had time, I'd make a library (perhaps musl-based?) that implements "libc but without all the parts you're not actually supposed to use."
bakpakin 186 days ago [-]
Gibbon1 186 days ago [-]
That make me wonder how hard it would be to just strip the bad functions out of the library.
anfilt 186 days ago [-]
Well your std c library is generally just a collection of .o, .a, .lib files. So you could use gnu bin tools to remove symbols of your choosing. However good luck getting your system to build software because that would modify the stdlib for all things compiled on your system. You could also tell gcc or clang not to link to the std library and providr your own with just a few switches on invocation of the compiler
Gibbon1 186 days ago [-]
Other thought would be to put the dlls with unsafe functions in different location than the safer ones and throw up a warning when an application loads them.
kbumsik 186 days ago [-]
Because of possible string buffer overflow?
opticfluorine 186 days ago [-]
Yes, and interestingly, because even when used correctly they "complicate audits". This is an interesting use of preprocessor macros, I'm strongly debating introducing something like this at work.
kbumsik 186 days ago [-]
I'm not an expert in C, but then what's the issue with strncpy() or any "n" functions? It prevents overflow AFAIK. Also what is the alternative (memcpy?) and why?
susam 186 days ago [-]
strncpy() does not guarantee that the copied string would be terminated with a null byte ('\0'). For a call that looks like strncpy(dst, src, n), if there is no null byte in the first n bytes of src, the string copied to dst would also not contain a null byte.

Here is an example code to demonstrate the problem:

  #include <stdio.h>
  #include <string.h>

  int main()
      char a[] = "01234567";
      strncpy(a, "foobar", 4);
      printf("%.8s\n", a);
      return 0;
Here is the output:

  $ cc -std=c89 -Wall -Wextra -pedantic foo.c && ./a.out
A C89-conforming alternative I use is a macro like this that guarantees '\0'-termination as the first thing:

  #define strcp(a, b, c) (a[0] = '\0', strncat(a, b, c - 1))
An example from my code:

Here is how to use it:

  #include <stdio.h>
  #include <string.h>

  #define strcp(a, b, c) (a[0] = '\0', strncat(a, b, c - 1))

  int main()
      char a[] = "01234567";
      strcp(a, "foobar", 4);
      printf("%.8s\n", a);
      return 0;
Here is the output:

  $ cc -std=c89 -Wall -Wextra -pedantic foo.c && ./a.out
saagarjha 186 days ago [-]
Your macro refers to a twice; it might be better as a function.
foo101 186 days ago [-]
What has referring to a variable twice got to do with whether it should be a macro or function?
jfk13 186 days ago [-]
If the strcp() macro is used with a function as the first argument (or an expression that has side-effects), it's not obvious at the call site (or, probably, intended by the programmer) that the argument will be evaluated twice.

E.g. suppose we write something like

    char* buf = strcp(((char*)malloc(4)), "foobar", 4);
expecting this to copy the beginning of the string "foobar" into a newly-allocated 4-byte buffer. Oops... this will actually call malloc twice (leaking the first buffer), and it won't have written the intended '\0' into the buffer that actually ends up getting used, so all bets are off...

If strcp were a function, its first argument would be evaluated just once, and it would work as intended.

mras0 186 days ago [-]
aflag 186 days ago [-]
strncat is also banned, though.
Zardoz84 186 days ago [-]
strncpy have the danger of not putting a null character at the end of the destination , if source is longer than num.

> Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).

opticfluorine 186 days ago [-]
The issue is that strncpy doesn't guarantee that the destination string is null-terminated after a copy, particularly in the case where n is shorter than the original string, so you could run into overflows if you then ran strlen() or similar against it.

Here's a good explanation from Raymond Chen:

1over137 186 days ago [-]
strlcpy() is a nice alternative, it's from OpenBSD from 1996. Was later added to the other BSDs, Solaris and macOS. But, last I checked, Linux never added them. :(

ncmncm 186 days ago [-]
strlcpy is a terrible alternative. I have never seen it used correctly. It takes more and uglier code to use it correctly than to use strlen and strcpy. People who use it don't bother.

Finding use of strlcpy in a program is a red warning of sloppy code.

There are good reasons it was kept out of glibc for so long.

jandrese 186 days ago [-]
I agree that strlcpy is braindamaged, but why in God's Green Earth isn't there a sensible replacement in stdlib in 2019? This shouldn't be that hard, yet the best alternative is I think snprintf, which is just so ugly. For a long time it wasn't used because compiler support was spotty, but it's been 20 years now so that shouldn't be a major issue anymore.
baot 186 days ago [-]
There is, it's strncpy_s and it was added in C11. And I know, nobody uses C11: that's a security failure.
saagarjha 186 days ago [-]
strncpy_s performs a lot of checks that might not be useful, though.
ncmncm 186 days ago [-]
The (more-or-less) sensible replacement is std::string. Adoption has been spotty, for practical, ideological, and fetishistic reasons.

The first step is to compile the C program with a C++ compiler. Next, start making improvements. If you skip the first step, the ceiling on improvements is limited.

saagarjha 186 days ago [-]
C++≠C, and making the jump isn't always feasible.
ncmncm 186 days ago [-]
As I said.

And when it's not, the ceiling on improvements is limited.

kd5bjo 186 days ago [-]
It doesn’t ensure that there’s a null terminator in the resultant string. This may potentially lead to a later overrun when the copied string is read back.

(See also )

186 days ago [-]
186 days ago [-]
whydoyoucare 186 days ago [-]
It is also important to teach newbies on the pitfalls of certain libc functions. Too much spoon feeding creates a bigger mess in the long run.
miyeahu 186 days ago [-]
Huh, I literally just read through this file a few minutes before this was posted because I was randomly browsing the git source code.
jepler 186 days ago [-]
I guess they care too much about portability to use "pragma GCC poison"?
MaxBarraclough 186 days ago [-]
To rephrase adrianN's comment:

It wouldn't mean breaking things on compilers other than GCC. It would simply add compile-time checks when building with GCC, and would do nothing on other compilers.

zsrxx 186 days ago [-]
Wanting code to compile on something other than gcc is not "caring too much", it's being responsible
adrianN 186 days ago [-]
For this particular case it wouldn't be a big deal. The code is still valid C, it's just that GCC makes sure that you don't use the poisoned values.
The_rationalist 186 days ago [-]
Responsible? What an hyperbolic statement. Could you enlighten me on what would be the significant negative consequences? I hope that your mental equation take into account the missed usefulness of GCC only attributes, flags and extensions by being clang compatible.
zsrxx 186 days ago [-]
I don't know what to tell you. Do you think all compilers should implement all GCC proprietary extensions, basically creating a parallel C standard?

And that's not saying that GCC extensions are not useful, but they should not be considered the norm

klez 186 days ago [-]
Agreed. But I wonder if we already have a sort of parallel C standard already with POSIX (of course the parts about C, not about OS facilities).
Koshkin 186 days ago [-]
> strcpy ... BANNED

But they cannot ban while (* p++ = * q++); now can they.

LukeShu 186 days ago [-]
It's not meant to be a substitute for humans reviewing patches. It's just meant to catch things new contributors are likely to try to use, and give them feedback on it before they send the patch to the mailing list.
rat9988 186 days ago [-]
Truly curious as I don't code in C, or any other low level language. What is the problem with it? That there might not be enough memory allocated at *q?
microtherion 186 days ago [-]
Not enough memory at *p. The loop performs the exact same function that strcpy does, and some people just write it out by hand.
zsrxx 186 days ago [-]
It doesn't stop until *p is \0, so yes
badrabbit 186 days ago [-]
Puts(),gets() would make a good addition?
zzo38computer 186 days ago [-]
I think gets() especially should never be used (but puts() is OK), but I think strcpy() and all of the other stuff they mention there is actually OK to use if you are careful.
hawski 186 days ago [-]
What's wrong with puts? GCC for example will even change a naive printf to puts when applicable.
badrabbit 186 days ago [-]
You're right about puts. I guess I am just used to avoiding both.
Koshkin 186 days ago [-]
K&R is turning in its grave.
kgwgk 186 days ago [-]
You made me check that K is still alive!
basementcat 186 days ago [-]
Koshkin wrote K&R and not K|R.
clX111 186 days ago [-]
There are persons working on git who abuse strcpy()? This sounds a bit like activism and not based on reality.
SopaDeMacaco 186 days ago [-]
Bunch of cucks who need to gimp themselves, just because they don't feel responsible enough to use a tool.
____Sash---701_ 186 days ago [-]
#ifndef BANNED_H #define BANNED_H

/* * This header lists functions that have been banned from our code base, * because they're too easy to misuse (and even if used correctly, * complicate audits). Including this header turns them into compile-time * errors. /

#define BANNED(func) sorry_##func##_is_a_banned_function

#undef strcpy #define strcpy(x,y) BANNED(strcpy) #undef strcat #define strcat(x,y) BANNED(strcat) #undef strncpy #define strncpy(x,y,n) BANNED(strncpy) #undef strncat #define strncat(x,y,n) BANNED(strncat)

#undef sprintf #undef vsprintf #ifdef HAVE_VARIADIC_MACROS #define sprintf(...) BANNED(sprintf) #define vsprintf(...) BANNED(vsprintf) #else #define sprintf(buf,fmt,arg) BANNED(sprintf) #define vsprintf(buf,fmt,arg) BANNED(sprintf) #endif

#endif / BANNED_H */

winter_blue 186 days ago [-]
musl libc (an alternative libc) provides implementations of these functions that are memory-safe. (It only works on Linux though.)
anilakar 186 days ago [-]
Using musl as your standard library causes other problems, namely horrible Python performance and incompatibility with Valgrind.
stonogo 186 days ago [-]
musl is standards-compliant, so it sounds more like "using python or valgrind causes other problems,"
megous 186 days ago [-]
What's slow in particular and on what arch?
anilakar 186 days ago [-]
Alpine. Slow in particular? I don't have the resources to profile everything in our codebase and find the root issue, but others have already benchmarked it:
morelisp 186 days ago [-]
Is is not possible to provide implementations of these functions which are both safe (meaning, within human ability to use correctly) and standards-compliant, as they either don't take any size or require leaving off the null terminator when the destination buffer is full.
mhh__ 186 days ago [-]
If you have a critical need for memory safety then I think it's time to move on from C.

I also have a (I guess?) Slightly unpopular opinion, that C is actually really ugly. Subjective, but C code is usually prone to repeating itself a lot or abstracting in an unsafe manner

saagarjha 186 days ago [-]
What functions does musl provide?
winter_blue 186 days ago [-]
It provides everything that libc does.
saagarjha 186 days ago [-]
So how does it solve the issues with these functions?
winter_blue 186 days ago [-]
It doesn’t — I was wrong.

I was digging through its source code, and it turns out it isn’t memory-safe after all. Here’s the source code for stpcpy:

If dest is too small in the function linked to above, musl’s stpcpy will happy cause a buffer overflow.

Don’t know how or from where I got the impression that musl was a “safer” alternative to libc.

cyphar 186 days ago [-]
The API of strcpy() is unsafe by design. The only way to make it safe is to not use it, and it's not musl's (nor glibc's) fault that this is the case.

musl is interesting for a variety of other reasons, but being more memory safe isn't really one of them.

_kst_ 186 days ago [-]
strpy() is safe when used safely. That means that you've confirmed that the target is big enough.

For example, there's no way this can fail:

    char s[10];
    strcpy(s, "hello");
The problem is that it can also be used unsafely, and except in the simplest cases it's difficult or impossible to tell whether a give usage is safe.

By contrast, gets() (which isn't even in the language anymore) is inherently unsafe, because you can't control what will appear on standard input.