Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is little meaning for NVIDIA to open-source only the driver portion of their cards, since they heavily rely on proprietary firmware and userspace lib (most important!) to do the real job. Firmware is a relatively small issue - this is mostly same for AMD and Intel, since encapsulation reduces work done on driver side and open-sourcing firmware could allow people to do some really unanticipated modification which might heavily threaten even commercial card sale. Nonetheless at least for AMD they still keep a fair share of work done by driver compared to Nvidia. Userspace library is the worst problem, since they handle a lot of GPU control related functionality and graphics API, which is still kept closed-source.

The best thing we can hope is improvement on NVK and RedHat's Nova Driver can put pressure on NVIDIA releasing their user space components.



It is meaningful because, as you note, it enables a fully opensource userspace driver. Of course the firmware is still proprietary and it increasingly contains more and more logic.


Which in a way is good because the hardware will more and more perform identically on Linux as on Windows.


Doesn't seem like a bad tradeoff so long as the proprietary stuff is kept completely isolated with no access to any other parts of my system.


Personally, I somewhat wonder about that. The firmware (proprietary) which runs on the gpu seems like it'll have access to do things over the gpu PCIe bus, including read system memory, and access other devices (including network gear). Reading memory of remote hosts (ie RDMA) is also a thing which Nvidia gpus can do.


Is that not solvable using an IOMMU (assuming hardware that has one)?


No idea personally. :)


An IOMMU does solve it, at the cost of some performance. The GPU can only access memory that the IOMMU allows, and the part that programs the IOMMU is open source.

RDMA requires a special network card and is opt-in - an RDMA NIC cannot access any random memory, only specially registered regions. One could argue that a NIC FW bug could cause arbitrary memory accesses, but that's another place where an IOMMU would help.


Awesome, thanks. :)


The GLX libraries are the elephant(s) in the room. Open source kernel modules mean nothing without these libraries. On the other hand AMD and Intel uses "pltform GLX" natively, and with great success.


Mesa already provides good open source GLX and Vulkan libraries. An open source NVIDIA kernel driver enables interoperability with Mesa exactly like Intel and AMD.


Half of the trade secrets NVIDIA has are living in their own GLX libraries. Even if you install the open source kernel module, these GLX libraries are installed (just did it on a new cluster).

I’m not holding my breath about these libraries to be phased out and NVIDIA integrates to the platform GLX any time soon.

I think NVIDIA will resist moving to a firmware only model (ala AMD & Intel) as long as they can, preferably forever.


The firmware is also signed, so you can't even do reverse engineering to replace it.


the open kernel driver also fundamentally breaks the limitation about geforce gpus not being licensed for use in the datacenter. that provision is a driver provision and CUDA does not follow the same license as the driver... really the only significant limitation is that you aren't allowed to use the CUDA toolkit to develop for non-NVIDIA hardware, and some license notice requirements if you redistribute the sample projects or other sample sourcecode. and yeah they paid to develop it, it's proprietary source code, that's reasonable overall.

https://docs.nvidia.com/cuda/eula/index.html

ctrl-f "datacenter": none

so yeah, I'm not sure where the assertion of "no progress" and "nothing meaningful" and "this changes nothing" come from, other than pure fanboyism/anti-fans. before you couldn't write a libre CUDA userland even if you wanted to - the kernel side wasn't there. And now you can, and this allows retiming and clock-up of supported gpus even with nouveau-style libre userlands. Which of course don't grow on trees, but it's still progress.

honestly it's kinda embarrassing that grown-ass adults are still getting their positions from what is functionally just some sick burn in a 2004 viral video or whatever, to the extent they actively oppose the company moving in the direction of libre software at all. but I think with the "linus torvalds" citers, you just can't reason those people out of a position that they didn't reason themselves into. Not only is it an emotionally-driven (and fanboy-driven) mindset, but it's literally not even their own position to begin with, it's just something they're absorbing from youtube via osmosis.

Apple debates and NVIDIA debates always come down to the anti-fans bringing down the discourse. It's honestly sad. https://paulgraham.com/fh.html

it also generally speaks to the long-term success and intellectual victory of the GPL/FSF that people see proprietary software as somehow inherently bad and illegitimate... even when source is available, in some cases. Like CUDA's toolchain and libraries/ecosystem is pretty much the ideal example of a company paying to develop a solution that would not otherwise have been developed, in a market that was (at the time) not really interested until NVIDIA went ahead and proved the value. You don't get to ret-con every single successful software project as being retroactively open-source just because you really really want to run it on a competitor's hardware. But people now have this mindset that if it's not libre then it's somehow illegitimate.

Again, most CUDA stuff is distributed as source, if you want to modify and extend it you can do so, subject to the terms of the CUDA license... and that's not good enough either.


Can you link the source code for CUDA please? Thanks.

Edit since I'm being downvoted: I did search for it and could not find it.



I really don't know where this crap about "Moving everything to the firmware" is coming from. The kernel part of the nvidia driver has always been small, and this is the only thing they are open-sourcing (they have been announcing it for months now......). The immense majority of the user-space driver is still closed and no one has seen any indications that this may change.

I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware. If you look at the opensource drivers you can even confirm by yourself that the firmware does practically nothing -- the size of the binary blobs of AMD cards are minuscule for example, and long are the times of ATOMBIOS. The drivers are literally generating bytecode-level binaries for the shader units in the GPU, what do you expect the firmware could even do at this point? Re-optimize the compiler output?

There was an example of a GPU that did move everything to the firmware -- the videocore on the raspberry pi, and it was clearly a completely distinct paradigm, as the "driver" would almost literally pass through OpenGL calls to a mailbox, read by the secondary ARM core (more powerful than the main ARM core!) that was basically running the actual driver as "firmware". Nothing I see on nvidia indicates a similar trend, otherwise RE-ing it would be trivial, as happened with the VC.


https://lwn.net/Articles/953144/

> Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.

> There are a number of disadvantages too, though. The firmware provides no stable ABI, and a lot of the calls it provides are not documented. The firmware files themselves are large, in the range of 20-30MB, and two of them are required for any given device. That significantly bloats a system's /boot directory and initramfs image (which must provide every version of the firmware that the kernel might need), and forces the Nouveau developers to be strict and careful about picking up firmware updates.


>> I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware.

Someone who believes this could easily prove that they are correct by "simply" taking their 4090 and documenting all its functionality, as was done with the [7900 xtx](https://github.com/geohot/7900xtx).

You can't say "I see no indications/evidence" unless you have proven that there is no evidence, no?


so basically “if you really think there’s no proof of a positive claim, then you won’t mind conclusively proving the negation”?

no, that’s not how either logical propositions or burden of proof works


He has already told you how to prove it: enumerate the functionality of the driver - the GPU and the code are finite, bounded environments. You can absolutely prove that there is no tea in a cup, that there are no coins in a purse, that there is no cat in a box, etc.


> no, that’s not how either logical propositions or burden of proof works

I think you're missing the point, perhaps intentionally to make a smart-sounding point?

We're programmers, working on _specific physical things_. If I claim that my CPU's branch predictor is not doing something, it is only prudent to find out what it is doing, and enumerate the finite set of what it contains.

Does that make sense? The goal is to figure out _how things actually work_ rather than making claims and arguing past each other until the end of time.

Perhaps you don't care about what the firmware blobs contain, and so you'd rather have an academic debate about logical propositions, but I care about the damn blobs, because it matters for my present and future work.


These aren't necessarily conflicting assessments. The addition of the GSP to Turing and later GPUs does mean that some behavior can be moved on-device from the drivers. Device initialization and management is an important piece of behavior, certainly, but in the context of the all work done by the Nvidia driver (both kernel and user-space), it is a relatively tiny portion (e.g. compiling/optimizing shaders and kernels, video encode/decode, etc).


There IS meaning because this makes it easier to install Nvidia drivers. At least, it reduces the number of failure modes. Now the open-source component can be managed by the kernel team, while the closed-source portion can be changed as needed, not dictated by kernel API changes.


Why is the user space component required? Won't they provide sysfs interfaces to control the hardware?


It's something common to all modern GPUs, not just NVIDIA: most of the logic is in a user space library loaded by the OpenGL or Vulkan loader into each program. That library writes a stream of commands into a buffer (plus all the necessary data) directly into memory accessible to the GPU, and there's a single system call at the end to ask the operating system kernel to tell the GPU to start reading from that command buffer. That is, other than memory allocation and a few other privileged operations, the user space programs talk directly to the GPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: