9.4 KiB
permalink | title | published_date | layout | data | excerpt | ||
---|---|---|---|---|---|---|---|
/{{ year }}/{{ month }}/{{ day }}/ssh-keyscan-fdlim-get-bad-value | ssh-keyscan: fdlim_get: bad value | 2023-06-20 15:00:00 +0200 | post.liquid |
|
`ssh-keyscan: fdlim_get: bad value` - That's the error message I got the other day when I was trying out some project. The web was incredibly useless in telling me what the hell was going wrong here. So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web. And this is the story how I found a type confusion bug in some 20-year old OpenSSH code. |
ssh-keyscan: fdlim_get: bad value
That's the error message I got the other day when I was trying out some project. The web was incredibly useless in telling me what the hell was going wrong here. So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web. And this is the story how I found a type confusion bug in some 20-year old OpenSSH code.
What is ssh-keyscan
?
ssh-keyscan
is a small utility to "gather SSH public keys from servers" and part of the OpenSSH package (see the man page).
The one that (most likely) provides you the SSH client and server.
You run it like this:
$ ssh-keyscan github.com
# github.com:22 SSH-2.0-babeld-dca4d356
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
<snip>
and get all public keys from that host. Or if you only need a specific type you pass that:
$ ssh-keyscan -t ed25519 github.com
# github.com:22 SSH-2.0-babeld-dca4d356
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
What went wrong?
Running it on my freshly booted M1 MacBook errors out:
$ ssh-keyscan github.com
ssh-keyscan: fdlim_get: bad value
$ echo $?
255
Yeah, not particularly helpful.
What's fdlim_get
? What bad value did it encounter?
Is this a Mac problem? Or a problem in ssh-keyscan?
So I tried from two of my Linux machines. No issues.
What's the code?
fdlim_get
is a function in the ssh-keyscan
code base in OpenSSH.
You can find it in ssh-keyscan.c
on GitHub.
It's supposed to get the maximum and current limit for file descriptors the program can use.
The error we're seeing is from a fdlim_get(1)
call further down in that file.
maxfd = fdlim_get(1);
if (maxfd < 0)
fatal("%s: fdlim_get: bad value", __progname);
Time to compile my own ssh-keyscan
, so I can modify and debug it:
git clone https://github.com/openssh/openssh-portable
cd openssh-portable
autoreconf
./configure --with-ssl-dir=/opt/homebrew/Cellar/openssl@1.1/1.1.1u
make ssh-keyscan
And now I can run it locally:
$ ./ssh-keyscan
usage: ssh-keyscan [-46cDHv] [-f file] [-O option] [-p port] [-T timeout]
[-t type] [host | addrlist namelist]
That was surprisingly easy. Let's dive into the code and try to understand it:
static int
fdlim_get(int hard)
{
#if defined(HAVE_GETRLIMIT) && defined(RLIMIT_NOFILE)
struct rlimit rlfd;
if (getrlimit(RLIMIT_NOFILE, &rlfd) == -1)
return (-1);
if ((hard ? rlfd.rlim_max : rlfd.rlim_cur) == RLIM_INFINITY)
return SSH_SYSFDMAX;
else
return hard ? rlfd.rlim_max : rlfd.rlim_cur;
#else
return SSH_SYSFDMAX;
#endif
}
Using some printf-debugging is a quick way to see some of those values.
Adding the following lines right after the getrlimit
call should tell me more:
printf("int size=%lu\n", sizeof(int));
printf("type size=%lu\n", sizeof(typeof(rlfd.rlim_max)));
printf("rlfd.rlim_max=%llu\n", rlfd.rlim_max);
printf("rlfd.rlim_cur=%llu\n", rlfd.rlim_cur);
printf("RLIM_INFINITY=%llu\n", RLIM_INFINITY);
printf("SSH_SYSFDMAX=%ld\n", SSH_SYSFDMAX);
After a make ssh-keyscan
and ./ssh-keyscan github.com
cycle I get:
$ ssh-keyscan github.com
int size=4
type size=8
rlfd.rlim_max=9223372036854775807
rlfd.rlim_cur=9223372036854775807
RLIM_INFINITY=9223372036854775807
SSH_SYSFDMAX=9223372036854775807
ssh-keyscan: fdlim_get: bad value
Remember the fdlim_get(1)
call and check later looked like this:
maxfd = fdlim_get(1);
if (maxfd < 0)
fatal("%s: fdlim_get: bad value", __progname);
And fdlim_get
is defined to return an int
, which is only 4 byte wide (that's 32 bit).
What's the biggest number one can fit into an int?
printf("INT_MAX=%d\n", INT_MAX);
INT_MAX=2147483647
That's smaller than 9223372036854775807
.
What's 9223372036854775807
as a 32-bit integer?
printf("int(SSH_SYSFDMAX)=%d\n", (int)SSH_SYSFDMAX);
int(SSH_SYSFDMAX)=-1
So from getrlimit
I get pretty large values, but because ssh-keyscan
stuffs them into a smaller type, it wraps around and returns -1
.
And that's smaller than 0
and thus a bad value
.
What now?
Why am I getting such large values to begin with?
$ ulimit -n
unlimited
(ulimit -n
shows the file descriptor limit for the current shell)
That's probably a large value. How does one change that in macOS? Multiple ways! First let's ask the OS what is configured:
$ launchctl limit maxfiles
maxfiles 256 unlimited
The first number, 256
, is a soft limit and the other, unlimited
, the hard limit per process.
Soft limit? Hard limit?
The soft limit is configurable by the user up to the hard limit, which can only be changed by root
.
But there's also a kernel configuration for it:
$ sysctl -a | grep maxfiles
kern.maxfiles: 122880
kern.maxfilesperproc: 61440
That's the hard limit for a single process (maxfilesperproc=61440
) and for all processes (maxfiles=122880
).
This doesn't even match the launchctl
output.
Let's change this using launchctl
1:
$ sudo launchctl limit maxfiles 245760 491520
$ launchctl limit maxfiles
maxfiles 245760 491520
$ sysctl -a | grep maxfiles
kern.maxfiles: 491520
kern.maxfilesperproc: 245760
Now both outputs match.
Did that help with our ssh-keyscan
problem?
$ ulimit -n
unlimited
Still unlimited, I don't have high hopes now.
$ ./ssh-keyscan github.com
int size=4
type size=8
rlfd.rlim_max=9223372036854775807
rlfd.rlim_cur=9223372036854775807
RLIM_INFINITY=9223372036854775807
SSH_SYSFDMAX=9223372036854775807
ssh-keyscan: fdlim_get: bad value
And indeed it still fails and I get large values. What if I change the limit just for this shell session?
$ ulimit -n 245760
$ ulimit -n
245760
$ ./ssh-keyscan github.com
int size=4
type size=8
rlfd.rlim_max=9223372036854775807
rlfd.rlim_cur=245760
RLIM_INFINITY=9223372036854775807
SSH_SYSFDMAX=245760
int size=4
type size=8
rlfd.rlim_max=9223372036854775807
rlfd.rlim_cur=245760
RLIM_INFINITY=9223372036854775807
SSH_SYSFDMAX=245760
# github.com:22 SSH-2.0-babeld-dca4d356
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
<snip>
It works!
(We get the whole debug output twice, because fdlim_get
is called twice)
Wait, why did SSH_SYSFDMAX
change? Isn't that a constant?
Yes and no:
$ grep -R "define SSH_SYSFDMAX" .
./defines.h:# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX)
./defines.h:# define SSH_SYSFDMAX 10000
In defines.h
:
/* Maximum number of file descriptors available */
#ifdef HAVE_SYSCONF
# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX)
#else
# define SSH_SYSFDMAX 10000
#endif
It's sysconf(_SC_OPEN_MAX)
, a function call!
And _SC_OPEN_MAX
is defined as:
The maximum number of files that a process can have open at any time. Must not be less than
_POSIX_OPEN_MAX (20)
.
So it's the limit I configured using ulimit -n 245760
above.
I am still confused why launchctl limit maxfiles
and sysctl -a
are different on a freshly booted machine,
but configuring values with launchctl
then touches those sysctl
values too.
According to some people everyone I asked ulimit -n
gives them 256
, a small but much more sensible value.
I still have no clue why it's unlimited
on my machine.
Turns out I ran into that problem 2 years ago in another project (and got it fixed): entr: Segmentation fault on MacBook M1 due to unlimited file descriptors.
This MacBook is cursed.
At least now there will be search results for fdlim_get: bad value
on the internet.
Update: I'm not sure what the best way is for OpenSSH to fix this, but I've filed the issue
so the team can make the right choice for them.
Update 2023-06-22: Damien Miller has acknowledged the issue and already committed two patches capturing any limit above INT_MAX
and thus fixing the bug.
Footnotes:
-
I cannot recommend to run
sudo launchctl limit maxfiles 1024 1024
. You won't be able to shut down your system anymore. ↩︎