There is little meaning for NVIDIA to open-source only the driver portion of the...

gpderetta · on July 18, 2024

It is meaningful because, as you note, it enables a fully opensource userspace driver. Of course the firmware is still proprietary and it increasingly contains more and more logic.

sscarduzio · on July 18, 2024

Which in a way is good because the hardware will more and more perform identically on Linux as on Windows.

matheusmoreira · on July 18, 2024

Doesn't seem like a bad tradeoff so long as the proprietary stuff is kept completely isolated with no access to any other parts of my system.

justinclift · on July 18, 2024

Personally, I somewhat wonder about that. The firmware (proprietary) which runs on the gpu seems like it'll have access to do things over the gpu PCIe bus, including read system memory, and access other devices (including network gear). Reading memory of remote hosts (ie RDMA) is also a thing which Nvidia gpus can do.

foresto · on July 18, 2024

Is that not solvable using an IOMMU (assuming hardware that has one)?

justinclift · on July 18, 2024

No idea personally. :)

riehwvfbk · on July 19, 2024

An IOMMU does solve it, at the cost of some performance. The GPU can only access memory that the IOMMU allows, and the part that programs the IOMMU is open source.

RDMA requires a special network card and is opt-in - an RDMA NIC cannot access any random memory, only specially registered regions. One could argue that a NIC FW bug could cause arbitrary memory accesses, but that's another place where an IOMMU would help.

justinclift · on July 19, 2024

Awesome, thanks. :)

bayindirh · on July 18, 2024

The GLX libraries are the elephant(s) in the room. Open source kernel modules mean nothing without these libraries. On the other hand AMD and Intel uses "pltform GLX" natively, and with great success.

gpderetta · on July 23, 2024

Mesa already provides good open source GLX and Vulkan libraries. An open source NVIDIA kernel driver enables interoperability with Mesa exactly like Intel and AMD.

bayindirh · on July 23, 2024

Half of the trade secrets NVIDIA has are living in their own GLX libraries. Even if you install the open source kernel module, these GLX libraries are installed (just did it on a new cluster).

I’m not holding my breath about these libraries to be phased out and NVIDIA integrates to the platform GLX any time soon.

I think NVIDIA will resist moving to a firmware only model (ala AMD & Intel) as long as they can, preferably forever.

pabs3 · on July 18, 2024

The firmware is also signed, so you can't even do reverse engineering to replace it.

paulmd · on July 18, 2024

the open kernel driver also fundamentally breaks the limitation about geforce gpus not being licensed for use in the datacenter. that provision is a driver provision and CUDA does not follow the same license as the driver... really the only significant limitation is that you aren't allowed to use the CUDA toolkit to develop for non-NVIDIA hardware, and some license notice requirements if you redistribute the sample projects or other sample sourcecode. and yeah they paid to develop it, it's proprietary source code, that's reasonable overall.

https://docs.nvidia.com/cuda/eula/index.html

ctrl-f "datacenter": none

so yeah, I'm not sure where the assertion of "no progress" and "nothing meaningful" and "this changes nothing" come from, other than pure fanboyism/anti-fans. before you couldn't write a libre CUDA userland even if you wanted to - the kernel side wasn't there. And now you can, and this allows retiming and clock-up of supported gpus even with nouveau-style libre userlands. Which of course don't grow on trees, but it's still progress.

honestly it's kinda embarrassing that grown-ass adults are still getting their positions from what is functionally just some sick burn in a 2004 viral video or whatever, to the extent they actively oppose the company moving in the direction of libre software at all. but I think with the "linus torvalds" citers, you just can't reason those people out of a position that they didn't reason themselves into. Not only is it an emotionally-driven (and fanboy-driven) mindset, but it's literally not even their own position to begin with, it's just something they're absorbing from youtube via osmosis.

Apple debates and NVIDIA debates always come down to the anti-fans bringing down the discourse. It's honestly sad. https://paulgraham.com/fh.html

it also generally speaks to the long-term success and intellectual victory of the GPL/FSF that people see proprietary software as somehow inherently bad and illegitimate... even when source is available, in some cases. Like CUDA's toolchain and libraries/ecosystem is pretty much the ideal example of a company paying to develop a solution that would not otherwise have been developed, in a market that was (at the time) not really interested until NVIDIA went ahead and proved the value. You don't get to ret-con every single successful software project as being retroactively open-source just because you really really want to run it on a competitor's hardware. But people now have this mindset that if it's not libre then it's somehow illegitimate.

Again, most CUDA stuff is distributed as source, if you want to modify and extend it you can do so, subject to the terms of the CUDA license... and that's not good enough either.

Zambyte · on July 19, 2024

Can you link the source code for CUDA please? Thanks.

Edit since I'm being downvoted: I did search for it and could not find it.

paulmd · on July 19, 2024

https://github.com/NVIDIA/cccl

AshamedCaptain · on July 18, 2024

I really don't know where this crap about "Moving everything to the firmware" is coming from. The kernel part of the nvidia driver has always been small, and this is the only thing they are open-sourcing (they have been announcing it for months now......). The immense majority of the user-space driver is still closed and no one has seen any indications that this may change.

I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware. If you look at the opensource drivers you can even confirm by yourself that the firmware does practically nothing -- the size of the binary blobs of AMD cards are minuscule for example, and long are the times of ATOMBIOS. The drivers are literally generating bytecode-level binaries for the shader units in the GPU, what do you expect the firmware could even do at this point? Re-optimize the compiler output?

There was an example of a GPU that did move everything to the firmware -- the videocore on the raspberry pi, and it was clearly a completely distinct paradigm, as the "driver" would almost literally pass through OpenGL calls to a mailbox, read by the secondary ARM core (more powerful than the main ARM core!) that was basically running the actual driver as "firmware". Nothing I see on nvidia indicates a similar trend, otherwise RE-ing it would be trivial, as happened with the VC.

ploxiln · on July 18, 2024

https://lwn.net/Articles/953144/

> Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.

> There are a number of disadvantages too, though. The firmware provides no stable ABI, and a lot of the calls it provides are not documented. The firmware files themselves are large, in the range of 20-30MB, and two of them are required for any given device. That significantly bloats a system's /boot directory and initramfs image (which must provide every version of the firmware that the kernel might need), and forces the Nouveau developers to be strict and careful about picking up firmware updates.

noch · on July 18, 2024

>> I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware.

Someone who believes this could easily prove that they are correct by "simply" taking their 4090 and documenting all its functionality, as was done with the [7900 xtx](https://github.com/geohot/7900xtx).

You can't say "I see no indications/evidence" unless you have proven that there is no evidence, no?

paulmd · on July 18, 2024

so basically “if you really think there’s no proof of a positive claim, then you won’t mind conclusively proving the negation”?

no, that’s not how either logical propositions or burden of proof works

exe34 · on July 19, 2024

He has already told you how to prove it: enumerate the functionality of the driver - the GPU and the code are finite, bounded environments. You can absolutely prove that there is no tea in a cup, that there are no coins in a purse, that there is no cat in a box, etc.

noch · on July 19, 2024

> no, that’s not how either logical propositions or burden of proof works

I think you're missing the point, perhaps intentionally to make a smart-sounding point?

We're programmers, working on _specific physical things_. If I claim that my CPU's branch predictor is not doing something, it is only prudent to find out what it is doing, and enumerate the finite set of what it contains.

Does that make sense? The goal is to figure out _how things actually work_ rather than making claims and arguing past each other until the end of time.

Perhaps you don't care about what the firmware blobs contain, and so you'd rather have an academic debate about logical propositions, but I care about the damn blobs, because it matters for my present and future work.

cpgxiii · on July 18, 2024

These aren't necessarily conflicting assessments. The addition of the GSP to Turing and later GPUs does mean that some behavior can be moved on-device from the drivers. Device initialization and management is an important piece of behavior, certainly, but in the context of the all work done by the Nvidia driver (both kernel and user-space), it is a relatively tiny portion (e.g. compiling/optimizing shaders and kernels, video encode/decode, etc).

phendrenad2 · on July 19, 2024

There IS meaning because this makes it easier to install Nvidia drivers. At least, it reduces the number of failure modes. Now the open-source component can be managed by the kernel team, while the closed-source portion can be changed as needed, not dictated by kernel API changes.

matheusmoreira · on July 18, 2024

Why is the user space component required? Won't they provide sysfs interfaces to control the hardware?

cesarb · on July 18, 2024

It's something common to all modern GPUs, not just NVIDIA: most of the logic is in a user space library loaded by the OpenGL or Vulkan loader into each program. That library writes a stream of commands into a buffer (plus all the necessary data) directly into memory accessible to the GPU, and there's a single system call at the end to ask the operating system kernel to tell the GPU to start reading from that command buffer. That is, other than memory allocation and a few other privileged operations, the user space programs talk directly to the GPU.