Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I hate about stderr is that it's character-based, not line-based.

I often get output from multiple threads or multiple processes garbled together on the same line. I know how to fix this, but I feel my OS should do it for me.



You can set the buffering mode of any file stream with setvbuf. For example, setvbuf(stderr, NULL, _IOLBF, BUFSIZ) sets stderr to line buffered I/O.


that may help, but if a write writes more than PIPE_BUF bytes, it isn't guaranteed atomic by the kernel. similarly, stdioing a line of more than BUFSIZ may result in multiple write calls. i don't think posix makes any guarantees there (this is just an empirically based speculation) and i'm fairly sure the c standard doesn't


Don't confuse the C stderr (which is of type FILE) from posix STDERR_FILENO file descriptor (i.e. 2). FILE (in POSIX, and in C since C11) guarantees that each I/O operation is thread safe (and flockfile in POSIX can be used to make larger operations atomic). A low level POSIX file descriptor is not thread safe (although of course the kernel will protect its own integrity). BUFSIZ only matter when writing to a pipe from distinct file descriptors.


i think the previous discussion may not have been clear enough, because you seem to be discussing a totally different scenario

given this program

    #include <string.h>
    #include <stdio.h>

    char large[16385];

    int main()
    {
      printf("BUFSIZ is %d\n", BUFSIZ);
      memset(large, 'A', sizeof(large));
      large[sizeof(large) - 1] = '\0';
      fprintf(stderr, "%s\n", large);
      return 0;
    }
compiled with `gcc -static` against glibc 2.36-9+deb12u7, we get this strace

    execve("./a.out", ["./a.out"], 0x7fffafcb4a30 /* 49 vars */) = 0
    brk(NULL)                               = 0x1e28000
    brk(0x1e28d00)                          = 0x1e28d00
    arch_prctl(ARCH_SET_FS, 0x1e28380)      = 0
    set_tid_address(0x1e28650)              = 1501924
    set_robust_list(0x1e28660, 24)          = 0
    rseq(0x1e28ca0, 0x20, 0, 0x53053053)    = 0
    prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=9788*1024, rlim_max=RLIM64_INFINITY}) = 0
    readlink("/proc/self/exe", "<censored>", 4096) = 21
    getrandom("<censored>", 8, GRND_NONBLOCK) = 8
    brk(NULL)                               = 0x1e28d00
    brk(0x1e49d00)                          = 0x1e49d00
    brk(0x1e4a000)                          = 0x1e4a000
    mprotect(0x4a0000, 16384, PROT_READ)    = 0
    newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x7), ...}, AT_EMPTY_PATH) = 0
    write(1, "BUFSIZ is 8192\n", 15)        = 15
    write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
    write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
    write(2, "\n", 1)                       = 1
    exit_group(0)                           = ?
    +++ exited with 0 +++
you can see that the single fprintf call resulted in three separate calls to write(2), even though it is only a single line of desperate screaming. those three calls happen at three separate times, typically on the order of tens of microseconds apart. if that file descriptor is open to, for example, a terminal or pipe or logfile that some other process is also writing to, that other process can write other data during those tens of microseconds, resulting in the intercalation of that other data in the middle of the screaming

threads are completely irrelevant here, except that i guess in an exotic scenario the 'other process' that is writing to the file could conceivably be a different thread in the same process? that would make your remarks about 'distinct file descriptors' and thread safety make sense. but we were talking about entirely separate processes writing to the file, since that's the usual case on unix, and in that case no form of thread-safety is worth squat; what matters is the semantics of the system calls

i don't think posix makes any guarantees about how many calls to write(2)† a call to fprintf(3) will result in, though i haven't actually looked, and i don't think wg14 concerns itself with environment-dependent questions like this at all

______

† or writev(2)


What he's saying is that as long as you don't mix usage of stdio and raw write(2), you won't have any interleaving problem; because there's a lock, which is why _unlocked variants exist.


that is correct in its own sphere of applicability, but incorrect in the scenario i was discussing, with entirely separate processes writing to the file, since that's the usual case on unix, because each process has a separate lock

(amelius, however, did mention the possibility of multiple threads!)

it also wasn't what they were saying

this thread is starting to remind me of the 'i'm not your buddy, pal' cascades from reddit


Sorry, I meant specifically threads, so the atomicity is purely process local of course. There is an I ternal (recursive) mutex inside FILE.


agreed (well, i haven't looked at how glibc implements the thread safety requirement, but i imagine you're correct)


stderr having line buffering turned off by default is intentional. You want to see the output immediately and not have it stuck in a buffer that might be lost if the program crashes or freezes.


In my experience, the biggest offender is programs trying to do syscalls directly (possibly for async-signal-safety), but not being aware of `writev`. Especially programs that do colored output can be really stupid here. Sometimes there are stupid programs that use multiple processes to do colored output even (IIRC CMake is a big offender here, but CMake is infamous for refusing to fix bugs)!

The pipe buffer is big enough that sane programs aren't likely to run into problems. The math:

PIPE_BUF is 512 per POSIX but in practice 4096 on Linux (probably others too?). If we assume a horrible-and-unlikely 12 formatting characters per real character (and assume a real character is non-BMP and thus 4 bytes, but still single-column), Linux has enough for 64 characters. With more reasonable assumptions (mostly ascii, no more than 4 formatting changes per line) we get more like 6 lines of output being atomic on Linux, and even POSIX being likely to get at least one whole line.


I think it makes sense as a default to avoid issues discerning timing due to a buffer.


Lack of a standard way to control standard stream buffering is a big pain point for me sometimes. I'm still salty the libc+environment based approach was rejected by maintainers. And it also cannot be fixed on the kernel side since buffering is purely userspace feature.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: