324 lines
9.4 KiB
Markdown
324 lines
9.4 KiB
Markdown
---
|
|
permalink: "/{{ year }}/{{ month }}/{{ day }}/ssh-keyscan-fdlim-get-bad-value"
|
|
title: "ssh-keyscan: fdlim_get: bad value"
|
|
published_date: "2023-06-20 15:00:00 +0200"
|
|
layout: post.liquid
|
|
data:
|
|
route: blog
|
|
excerpt: |
|
|
`ssh-keyscan: fdlim_get: bad value` - That's the error message I got the other day when I was trying out some project.
|
|
The web was incredibly useless in telling me what the hell was going wrong here.
|
|
So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web.
|
|
And this is the story how I found a type confusion bug in some 20-year old OpenSSH code.
|
|
---
|
|
|
|
```
|
|
ssh-keyscan: fdlim_get: bad value
|
|
```
|
|
|
|
That's the error message I got the other day when I was trying out some project.
|
|
The web was incredibly useless in telling me what the hell was going wrong here.
|
|
So I set out to find why this was happening, how to fix it and hopefully make this error message findable on the web.
|
|
And this is the story how I found a type confusion bug in some 20-year old OpenSSH code.
|
|
|
|
## What is `ssh-keyscan`?
|
|
|
|
`ssh-keyscan` is a small utility to "gather SSH public keys from servers" and part of the OpenSSH package (see [the man page][manpage]).
|
|
The one that (most likely) provides you the SSH client and server.
|
|
|
|
You run it like this:
|
|
|
|
```text
|
|
$ ssh-keyscan github.com
|
|
# github.com:22 SSH-2.0-babeld-dca4d356
|
|
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
|
|
<snip>
|
|
```
|
|
|
|
and get all public keys from that host.
|
|
Or if you only need a specific type you pass that:
|
|
|
|
```text
|
|
$ ssh-keyscan -t ed25519 github.com
|
|
# github.com:22 SSH-2.0-babeld-dca4d356
|
|
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
|
|
```
|
|
|
|
[manpage]: https://manpages.debian.org/bookworm/openssh-client/ssh-keyscan.1.en.html
|
|
|
|
## What went wrong?
|
|
|
|
Running it on my freshly booted M1 MacBook errors out:
|
|
|
|
```
|
|
$ ssh-keyscan github.com
|
|
ssh-keyscan: fdlim_get: bad value
|
|
$ echo $?
|
|
255
|
|
```
|
|
|
|
Yeah, not particularly helpful.
|
|
What's `fdlim_get`? What bad value did it encounter?
|
|
Is this a Mac problem? Or a problem in ssh-keyscan?
|
|
|
|
So I tried from two of my Linux machines. No issues.
|
|
|
|
## What's the code?
|
|
|
|
`fdlim_get` is a function in the `ssh-keyscan` code base in OpenSSH.
|
|
You can find it [in `ssh-keyscan.c` on GitHub](https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/ssh-keyscan.c#L129-L144).
|
|
It's supposed to get the maximum and current limit for file descriptors the program can use.
|
|
The error we're seeing is [from a `fdlim_get(1)` call further down in that file][fdlim_get_call].
|
|
|
|
[fdlim_get_call]: https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/ssh-keyscan.c#L830-L832
|
|
|
|
```c
|
|
maxfd = fdlim_get(1);
|
|
if (maxfd < 0)
|
|
fatal("%s: fdlim_get: bad value", __progname);
|
|
```
|
|
|
|
Time to compile my own `ssh-keyscan`, so I can modify and debug it:
|
|
|
|
```
|
|
git clone https://github.com/openssh/openssh-portable
|
|
cd openssh-portable
|
|
autoreconf
|
|
./configure --with-ssl-dir=/opt/homebrew/Cellar/openssl@1.1/1.1.1u
|
|
make ssh-keyscan
|
|
```
|
|
|
|
And now I can run it locally:
|
|
|
|
```
|
|
$ ./ssh-keyscan
|
|
usage: ssh-keyscan [-46cDHv] [-f file] [-O option] [-p port] [-T timeout]
|
|
[-t type] [host | addrlist namelist]
|
|
```
|
|
|
|
That was surprisingly easy.
|
|
Let's dive into the code and try to understand it:
|
|
|
|
```c
|
|
static int
|
|
fdlim_get(int hard)
|
|
{
|
|
#if defined(HAVE_GETRLIMIT) && defined(RLIMIT_NOFILE)
|
|
struct rlimit rlfd;
|
|
|
|
if (getrlimit(RLIMIT_NOFILE, &rlfd) == -1)
|
|
return (-1);
|
|
if ((hard ? rlfd.rlim_max : rlfd.rlim_cur) == RLIM_INFINITY)
|
|
return SSH_SYSFDMAX;
|
|
else
|
|
return hard ? rlfd.rlim_max : rlfd.rlim_cur;
|
|
#else
|
|
return SSH_SYSFDMAX;
|
|
#endif
|
|
}
|
|
```
|
|
|
|
Using some printf-debugging is a quick way to see some of those values.
|
|
Adding the following lines right after the `getrlimit` call should tell me more:
|
|
|
|
```c
|
|
printf("int size=%lu\n", sizeof(int));
|
|
printf("type size=%lu\n", sizeof(typeof(rlfd.rlim_max)));
|
|
printf("rlfd.rlim_max=%llu\n", rlfd.rlim_max);
|
|
printf("rlfd.rlim_cur=%llu\n", rlfd.rlim_cur);
|
|
printf("RLIM_INFINITY=%llu\n", RLIM_INFINITY);
|
|
printf("SSH_SYSFDMAX=%ld\n", SSH_SYSFDMAX);
|
|
```
|
|
|
|
After a `make ssh-keyscan` and `./ssh-keyscan github.com` cycle I get:
|
|
|
|
```
|
|
$ ssh-keyscan github.com
|
|
int size=4
|
|
type size=8
|
|
rlfd.rlim_max=9223372036854775807
|
|
rlfd.rlim_cur=9223372036854775807
|
|
RLIM_INFINITY=9223372036854775807
|
|
SSH_SYSFDMAX=9223372036854775807
|
|
ssh-keyscan: fdlim_get: bad value
|
|
```
|
|
|
|
Remember the `fdlim_get(1)` call and check [later][fdlim_get_call] looked like this:
|
|
|
|
```c
|
|
maxfd = fdlim_get(1);
|
|
if (maxfd < 0)
|
|
fatal("%s: fdlim_get: bad value", __progname);
|
|
```
|
|
|
|
And `fdlim_get` is defined to return an `int`, which is only 4 byte wide (that's 32 bit).
|
|
|
|
What's the biggest number one can fit into an int?
|
|
|
|
```c
|
|
printf("INT_MAX=%d\n", INT_MAX);
|
|
```
|
|
|
|
```
|
|
INT_MAX=2147483647
|
|
```
|
|
|
|
That's smaller than `9223372036854775807`.
|
|
What's `9223372036854775807` as a 32-bit integer?
|
|
|
|
```c
|
|
printf("int(SSH_SYSFDMAX)=%d\n", (int)SSH_SYSFDMAX);
|
|
```
|
|
|
|
```
|
|
int(SSH_SYSFDMAX)=-1
|
|
```
|
|
|
|
So from `getrlimit` I get pretty large values, but because `ssh-keyscan` stuffs them into a smaller type, it wraps around and returns `-1`.
|
|
And that's smaller than `0` and thus a `bad value`.
|
|
|
|
## What now?
|
|
|
|
Why am I getting such large values to begin with?
|
|
|
|
```
|
|
$ ulimit -n
|
|
unlimited
|
|
```
|
|
|
|
_(`ulimit -n` shows the file descriptor limit for the current shell)_
|
|
|
|
That's probably a large value.
|
|
How does one change that in macOS?
|
|
Multiple ways!
|
|
First let's ask the OS what is configured:
|
|
|
|
```
|
|
$ launchctl limit maxfiles
|
|
maxfiles 256 unlimited
|
|
```
|
|
|
|
The first number, `256`, is a soft limit and the other, `unlimited`, the hard limit per process.
|
|
Soft limit? Hard limit?
|
|
The soft limit is configurable by the user up to the hard limit, which can only be changed by `root`.
|
|
But there's also a kernel configuration for it:
|
|
|
|
```
|
|
$ sysctl -a | grep maxfiles
|
|
kern.maxfiles: 122880
|
|
kern.maxfilesperproc: 61440
|
|
```
|
|
|
|
That's the hard limit for a single process (`maxfilesperproc=61440`) and for all processes (`maxfiles=122880`).
|
|
This doesn't even match the `launchctl` output.
|
|
|
|
Let's change this using `launchctl`[^1]:
|
|
|
|
```
|
|
$ sudo launchctl limit maxfiles 245760 491520
|
|
$ launchctl limit maxfiles
|
|
maxfiles 245760 491520
|
|
$ sysctl -a | grep maxfiles
|
|
kern.maxfiles: 491520
|
|
kern.maxfilesperproc: 245760
|
|
```
|
|
|
|
Now both outputs match.
|
|
|
|
Did that help with our `ssh-keyscan` problem?
|
|
|
|
```
|
|
$ ulimit -n
|
|
unlimited
|
|
```
|
|
|
|
Still unlimited, I don't have high hopes now.
|
|
|
|
```
|
|
$ ./ssh-keyscan github.com
|
|
int size=4
|
|
type size=8
|
|
rlfd.rlim_max=9223372036854775807
|
|
rlfd.rlim_cur=9223372036854775807
|
|
RLIM_INFINITY=9223372036854775807
|
|
SSH_SYSFDMAX=9223372036854775807
|
|
ssh-keyscan: fdlim_get: bad value
|
|
```
|
|
|
|
And indeed it still fails and I get large values.
|
|
What if I change the limit just for this shell session?
|
|
|
|
```
|
|
$ ulimit -n 245760
|
|
$ ulimit -n
|
|
245760
|
|
$ ./ssh-keyscan github.com
|
|
int size=4
|
|
type size=8
|
|
rlfd.rlim_max=9223372036854775807
|
|
rlfd.rlim_cur=245760
|
|
RLIM_INFINITY=9223372036854775807
|
|
SSH_SYSFDMAX=245760
|
|
int size=4
|
|
type size=8
|
|
rlfd.rlim_max=9223372036854775807
|
|
rlfd.rlim_cur=245760
|
|
RLIM_INFINITY=9223372036854775807
|
|
SSH_SYSFDMAX=245760
|
|
# github.com:22 SSH-2.0-babeld-dca4d356
|
|
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
|
|
<snip>
|
|
```
|
|
|
|
It works!
|
|
_(We get the whole debug output twice, because `fdlim_get` is called twice)_
|
|
|
|
Wait, why did `SSH_SYSFDMAX` change? Isn't that a constant?
|
|
Yes and no:
|
|
|
|
```
|
|
$ grep -R "define SSH_SYSFDMAX" .
|
|
./defines.h:# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX)
|
|
./defines.h:# define SSH_SYSFDMAX 10000
|
|
```
|
|
|
|
In [`defines.h`](https://github.com/openssh/openssh-portable/blob/b4ac435b4e67f8eb5932d8f59eb5b3cf7dc38df0/defines.h#L728-L733):
|
|
|
|
```c
|
|
/* Maximum number of file descriptors available */
|
|
#ifdef HAVE_SYSCONF
|
|
# define SSH_SYSFDMAX sysconf(_SC_OPEN_MAX)
|
|
#else
|
|
# define SSH_SYSFDMAX 10000
|
|
#endif
|
|
```
|
|
|
|
It's `sysconf(_SC_OPEN_MAX)`, a function call!
|
|
And `_SC_OPEN_MAX` is [defined as](https://manpages.debian.org/bullseye/manpages-dev/sysconf.3.en.html#OPEN_MAX):
|
|
|
|
> The maximum number of files that a process can have open at any time. Must not be less than `_POSIX_OPEN_MAX (20)`.
|
|
|
|
So it's the limit I configured using `ulimit -n 245760` above.
|
|
|
|
I am still confused why `launchctl limit maxfiles` and `sysctl -a` are different on a freshly booted machine,
|
|
but configuring values with `launchctl` then touches those `sysctl` values too.
|
|
|
|
According to ~~some people~~ everyone I asked `ulimit -n` gives them `256`, a small but much more sensible value.
|
|
I still have no clue why it's `unlimited` on my machine.
|
|
|
|
Turns out I ran into that problem 2 years ago in another project (and got it fixed):
|
|
[entr: Segmentation fault on MacBook M1 due to unlimited file descriptors](https://github.com/eradman/entr/issues/63).
|
|
|
|
This MacBook is cursed.
|
|
At least now there will be search results for `fdlim_get: bad value` on the internet.
|
|
|
|
_Update_: I'm not sure what the best way is for OpenSSH to fix this, but I've [filed the issue](https://bugzilla.mindrot.org/show_bug.cgi?id=3581)
|
|
so the team can make the right choice for them.
|
|
_Update 2023-06-22_: Damien Miller has acknowledged the issue and already committed two patches capturing any limit above `INT_MAX` and thus fixing the bug.
|
|
|
|
---
|
|
|
|
_Footnotes:_
|
|
|
|
[^1]: I cannot recommend to run `sudo launchctl limit maxfiles 1024 1024`. You won't be able to shut down your system anymore.
|