From caded160c9acb67edd18401d7948120d29281ba2 Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Tue, 20 Jun 2023 14:54:01 +0200 Subject: [PATCH] new post: ssh-keyscan: fdlim_get: bad value --- ...3-06-20-ssh-keyscan-fdlim-get-bad-value.md | 319 ++++++++++++++++++ 1 file changed, 319 insertions(+) create mode 100644 _posts/2023-06-20-ssh-keyscan-fdlim-get-bad-value.md diff --git a/_posts/2023-06-20-ssh-keyscan-fdlim-get-bad-value.md b/_posts/2023-06-20-ssh-keyscan-fdlim-get-bad-value.md new file mode 100644 index 0000000..25bef83 --- /dev/null +++ b/_posts/2023-06-20-ssh-keyscan-fdlim-get-bad-value.md @@ -0,0 +1,319 @@ +--- +permalink: "/{{ year }}/{{ month }}/{{ day }}/ssh-keyscan-fdlim-get-bad-value" +title: "ssh-keyscan: fdlim_get: bad value" +published_date: "2023-06-20 15:00:00 +0200" +layout: post.liquid +data: + route: blog +excerpt: | + `ssh-keyscan: fdlim_get: bad value` - That's the error message I got the other day when I was trying out some project. + The web was incredibly useless in telling me what the hell was going wrong here. + So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web. + And this is the story how I found a type confusion bug in some 20-year old OpenSSH code. +--- + +``` +ssh-keyscan: fdlim_get: bad value +``` + +That's the error message I got the other day when I was trying out some project. +The web was incredibly useless in telling me what the hell was going wrong here. +So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web. +And this is the story how I found a type confusion bug in some 20-year old OpenSSH code. + +## What is `ssh-keyscan`? + +`ssh-keyscan` is a small utility to "gather SSH public keys from servers" and part of the OpenSSH package (see [the man page][manpage]). +The one that (most likely) provides you the SSH client and server. + +You run it like this: + +```text +$ ssh-keyscan github.com +# github.com:22 SSH-2.0-babeld-dca4d356 +github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg= + +``` + +and get all public keys from that host. +Or if you only need a specific type you pass that: + +```text +$ ssh-keyscan -t ed25519 github.com +# github.com:22 SSH-2.0-babeld-dca4d356 +github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl +``` + +[manpage]: https://manpages.debian.org/bookworm/openssh-client/ssh-keyscan.1.en.html + +## What went wrong? + +Running it on my freshly booted M1 MacBook errors out: + +``` +$ ssh-keyscan github.com +ssh-keyscan: fdlim_get: bad value +$ echo $? +255 +``` + +Yeah, not particularly helpful. +What's `fdlim_get`? What bad value did it encounter? +Is this a Mac problem? Or a problem in ssh-keyscan? + +So I tried from two of my Linux machines. No issues. + +## What's the code? + +`fdlim_get` is a function in the `ssh-keyscan` code base in OpenSSH. +You can find it [in `ssh-keyscan.c` on GitHub](https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/ssh-keyscan.c#L129-L144). +It's supposed to get the maximum and current limit for file descriptors the program can use. +The error we're seeing is [from a `fdlim_get(1)` call further down in that file][fdlim_get_call]. + +[fdlim_get_call]: https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/ssh-keyscan.c#L830-L832 + +```c +maxfd = fdlim_get(1); +if (maxfd < 0) + fatal("%s: fdlim_get: bad value", __progname); +``` + +Time to compile my own `ssh-keyscan`, so I can modify and debug it: + +``` +git clone https://github.com/openssh/openssh-portable +cd openssh-portable +autoreconf +./configure --with-ssl-dir=/opt/homebrew/Cellar/openssl@1.1/1.1.1u +make ssh-keyscan +``` + +And now I can run it locally: + +``` +$ ./ssh-keyscan +usage: ssh-keyscan [-46cDHv] [-f file] [-O option] [-p port] [-T timeout] + [-t type] [host | addrlist namelist] +``` + +That was surprisingly easy. +Let's dive into the code and try to understand it: + +```c +static int +fdlim_get(int hard) +{ +#if defined(HAVE_GETRLIMIT) && defined(RLIMIT_NOFILE) + struct rlimit rlfd; + + if (getrlimit(RLIMIT_NOFILE, &rlfd) == -1) + return (-1); + if ((hard ? rlfd.rlim_max : rlfd.rlim_cur) == RLIM_INFINITY) + return SSH_SYSFDMAX; + else + return hard ? rlfd.rlim_max : rlfd.rlim_cur; +#else + return SSH_SYSFDMAX; +#endif +} +``` + +Using some printf-debugging is a quick way to see some of those values. +Adding the following lines right after the `getrlimit` call should tell me more: + +```c +printf("int size=%lu\n", sizeof(int)); +printf("type size=%lu\n", sizeof(typeof(rlfd.rlim_max))); +printf("rlfd.rlim_max=%llu\n", rlfd.rlim_max); +printf("rlfd.rlim_cur=%llu\n", rlfd.rlim_cur); +printf("RLIM_INFINITY=%llu\n", RLIM_INFINITY); +printf("SSH_SYSFDMAX=%ld\n", SSH_SYSFDMAX); +``` + +After a `make ssh-keyscan` and `./ssh-keyscan github.com` cycle I get: + +``` +$ ssh-keyscan github.com +int size=4 +type size=8 +rlfd.rlim_max=9223372036854775807 +rlfd.rlim_cur=9223372036854775807 +RLIM_INFINITY=9223372036854775807 +SSH_SYSFDMAX=9223372036854775807 +ssh-keyscan: fdlim_get: bad value +``` + +Remember the `fdlim_get(1)` call and check [later][fdlim_get_call] looked like this: + +```c +maxfd = fdlim_get(1); +if (maxfd < 0) + fatal("%s: fdlim_get: bad value", __progname); +``` + +And `fdlim_get` is defined to return an `int`, which is only 4 byte wide (that's 32 bit). + +What's the biggest number one can fit into an int? + +```c +printf("INT_MAX=%d\n", INT_MAX); +``` + +``` +INT_MAX=2147483647 +``` + +That's smaller than `9223372036854775807`. +What's `9223372036854775807` as a 32-bit integer? + +```c +printf("int(SSH_SYSFDMAX)=%d\n", (int)SSH_SYSFDMAX); +``` + +``` +int(SSH_SYSFDMAX)=-1 +``` + +So from `getrlimit` I get pretty large values, but because `ssh-keyscan` stuffs them into a smaller type, it wraps around and returns `-1`. +And that's smaller than `0` and thus a `bad value`. + +## What now? + +Why am I getting such large values to begin with? + +``` +$ ulimit -n +unlimited +``` + +_(`ulimit -n` shows the file descriptor limit for the current shell)_ + +That's probably a large value. +How does one change that in macOS? +Multiple ways! +First let's ask the OS what is configured: + +``` +$ launchctl limit maxfiles + maxfiles 256 unlimited +``` + +The first number, `256`, is a soft limit and the other, `unlimited`, the hard limit per process. +Soft limit? Hard limit? +The soft limit is configurable by the user up to the hard limit, which can only be changed by `root`. +But there's also a kernel configuration for it: + +``` +$ sysctl -a | grep maxfiles +kern.maxfiles: 122880 +kern.maxfilesperproc: 61440 +``` + +That's the hard limit for a single process (`maxfilesperproc=61440`) and for all processes (`maxfiles=122880`). +This doesn't even match the `launchctl` output. + +Let's change this using `launchctl`[^1]: + +``` +$ sudo launchctl limit maxfiles 245760 491520 +$ launchctl limit maxfiles + maxfiles 245760 491520 +$ sysctl -a | grep maxfiles +kern.maxfiles: 491520 +kern.maxfilesperproc: 245760 +``` + +Now both outputs match. + +Did that help with our `ssh-keyscan` problem? + +``` +$ ulimit -n +unlimited +``` + +Still unlimited, I don't have high hopes now. + +``` +$ ./ssh-keyscan github.com +int size=4 +type size=8 +rlfd.rlim_max=9223372036854775807 +rlfd.rlim_cur=9223372036854775807 +RLIM_INFINITY=9223372036854775807 +SSH_SYSFDMAX=9223372036854775807 +ssh-keyscan: fdlim_get: bad value +``` + +And indeed it still fails and I get large values. +What if I change the limit just for this shell session? + +``` +$ ulimit -n 245760 +$ ulimit -n +245760 +$ ./ssh-keyscan github.com +int size=4 +type size=8 +rlfd.rlim_max=9223372036854775807 +rlfd.rlim_cur=245760 +RLIM_INFINITY=9223372036854775807 +SSH_SYSFDMAX=245760 +int size=4 +type size=8 +rlfd.rlim_max=9223372036854775807 +rlfd.rlim_cur=245760 +RLIM_INFINITY=9223372036854775807 +SSH_SYSFDMAX=245760 +# github.com:22 SSH-2.0-babeld-dca4d356 +github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg= + +``` + +It works! +_(We get the whole debug output twice, because `fdlim_get` is called twice)_ + +Wait, why did `SSH_SYSFDMAX` change? Isn't that a constant? +Yes and no: + +``` +$ grep -R "define SSH_SYSFDMAX" . +./defines.h:# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX) +./defines.h:# define SSH_SYSFDMAX 10000 +``` + +In [`defines.h`](https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/defines.h#L728-L733): + +```c +/* Maximum number of file descriptors available */ +#ifdef HAVE_SYSCONF +# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX) +#else +# define SSH_SYSFDMAX 10000 +#endif +``` + +It's `sysconf(_SC_OPEN_MAX)`, a function call! +And `_SC_OPEN_MAX` is [defined as](https://manpages.debian.org/bullseye/manpages-dev/sysconf.3.en.html#OPEN_MAX): + +> The maximum number of files that a process can have open at any time. Must not be less than `_POSIX_OPEN_MAX (20)`. + +So it's the limit I configured using `ulimit -n 245760` above. + +I am still confused why `launchctl limit maxfiles` and `sysctl -a` are different on a freshly booted machine, +but configuring values with `launchctl` then touches those `sysctl` values too. + +According to ~~some people~~ everyone I asked `ulimit -n` gives them `256`, a small but much more sensible value. +I still have no clue why it's `unlimited` on my machine. + +Turns out I ran into that problem 2 years ago in another project (and got it fixed): +[entr: Segmentation fault on MacBook M1 due to unlimited file descriptors](https://github.com/eradman/entr/issues/63). + +This MacBook is cursed. +At least now there will be search results for `fdlim_get: bad value` on the internet. + +--- + +_Footnotes:_ + +[^1]: I cannot recommend to run `sudo launchctl limit maxfiles 1024 1024`. You won't be able to shut down your system anymore.