Rust: 128 bit integers preparing to be released

steveklabnik · on Dec 4, 2016

This title implies some things that aren't quite right. Let's take a step back.

The current release of stable Rust is 1.13, beta is 1.14, nightly is 1.15.

New features land on master, hence 1.15. So this means that, if it does land, it will become available on nightly soon. But it's still behind a feature flag. So it won't actually come out in Rust 1.15.

I am not on the relevant subteam here, so I am not 100% sure of the procedure, but usually, stuff has to sit in nightly for a full release cycle to be eligible for stabilization. So if this lands tonight, it'll be in nightly 1.15, 1.16 will be its full cycle, and it'll be elligible for release in 1.17, which would be March 16, 2017.

geoffreyiy1 · on Dec 4, 2016

Yea I didn't know how to word it without it becoming more of a sentence and less of a title.

steveklabnik · on Dec 4, 2016

Totally fair! I'm not sure what I would have said either.

andars · on Dec 4, 2016

In case anyone else is interested in reading it, the relevant RFC is here: https://github.com/rust-lang/rfcs/blob/master/text/1504-int1...

drfuchs · on Dec 4, 2016

The RFC doesn't seem to mention anything about the atomicity (or lack thereof) of loads and stores of variables of the 128-bit types. From the discussion, it seems that on a number of current architectures, they're not going to be atomic.

Does this cause any problems in the Rust view of the world? Will there be unanticipated issues for developers who have been getting along just fine so far unknowingly assuming that all operations are atomic, which they have been just by virtue of how the hardware works?

the_duke · on Dec 4, 2016

I what drfuchs might be getting at is that 128bit integers would need two registers on a 64bit architecture. Or four on a 32 bit one.

LLVM supports arbitrary length integer types, regardless of the platform. Rust already has u64 and i64 on 32bit architectures.

But even if something fits in one register, a variable isn't thread safe anyway.

Rust DOES NOT allow a mutable reference to exist simultaneously with non-mutable ones.

For thread safety, you always need special types, which Rust provides:

https://doc.rust-lang.org/std/sync/atomic/

https://doc.rust-lang.org/std/sync/struct.Arc.html

drfuchs · on Dec 4, 2016

Thanks, but to be even more explicit, I'm thinking of a lower level than thread safety and the nice atomic references Rust provides in "safe" code: Consider a systems-y program that handles signals, or even hardware interrupts, or mmaps a chunk of shared memory. Any Read of a 32-bit variable will produce a value that was Written previously, no matter what; the hardware guarantees it. But a Read of a 128-bit variable may result in a value that has the high 64 bits from one earlier Write, and the low 64 bits from a more recent Write that was interrupted (or vice-versa). That's different than what many folks have ever experienced, messes up some algorithms, and is sure confusing the first time you see it.

True, this won't come up in a run-of-the-mill program, but if Rust is going to supplant C for OS work, or DataBase implementations, this is an issue to be aware of.

Succinctly: If you're in unsafe code, using 128-bit integers, you may experience behavior you've never seen before.

Manishearth · on Dec 4, 2016

All the Rust OSes written right now use pretty tightly scoped unsafe code. The OS isn't just unsafe-blocks-everywhere.

The compiler would force you to use an atomic. Once you realize no AtomicUint128 atomic exists, you may try to explicitly use unsafe to make things work, but even then that requires a lot more explicitness than just an unsafe block. You will realize your mistake some point here.

Unsafe in Rust doesn't just turn off all the checks. It gives you the power to circumvent the checks, but these circumventions are still individually explicit.

Anyway, u128 doesn't worsen this situation. You could already make thread safety mistakes with wider value types if you really really wanted to, just like u128.

Senji · on Dec 4, 2016

What stops you from manually entering a critical region yourself before you do the assignment?

Manishearth · on Dec 4, 2016

Could you elaborate on what you mean here? The compiler stops you from trying to mutate things when they're not uniquely owned or atomic (or behind a mutex). When you have a non-unique reference you need to use a bunch of unsafe function calls to force mutation on it.

seeekr · on Dec 4, 2016

Not GP, but iiuc he is just stating: In order to prevent these kinds of race conditions you can just use any regular locking mechanism. (As you could with many other cases where you'd like to introduce atomicity manually.)

Manishearth · on Dec 4, 2016

Right, and the compiler forces you to do so.

the_duke · on Dec 4, 2016

All valid points that one needs to consider when writing low level code.

The good thing about Rust is that you actively have to circumvent the languages safety checks with unsafe code to run into those problems.

And if you are using unsafe code, you should be aware of such low level considerations anyway, and would refrain from using types when not appropriate.

The consistency problems you describe apply to any datastructure after all.

snovv_crash · on Dec 4, 2016

That kind of non-safe memory sharing doesn't exist in Rust, unless you explicitly mark code as 'unsafe'. If you want to write a variable, nobody else is allowed to read it until you are finished, and this is enforced at compile time via the language design. The same is true for all data structures.

dbaupp · on Dec 4, 2016

One of Rust's strongest features is its ability to outlaw data races. In this case, the 128-bit types only provide mutations that take &mut self, that is, a unique unaliased pointer, meaning there's no way to concurrently mutate and hence they are automatically atomic, in a sense.

chrisseaton · on Dec 4, 2016

Does Rust run a memory barrier when ownership is transferred? How does it ensure that the previous owner has finished all its writes when a new owner take over?

_hrfd · on Dec 4, 2016

To transfer ownership, you would wrap the object in Mutex<T>, send it via Sender<T>, use some other synchronization primitive, or simply pass it over at the moment the thread is spawned.

It's up to the synchronization primitives to ensure memory barriers are executed, not the language. So e.g. a Mutex<T> would execute a memory barrier when locking and unlocking. Rust then allows you to pass ownership in a trusted manner only, i.e. through those primitives.

Of course, you can also create an unsafe block and say: "Rust, trust me, I know what I'm doing, don't bother me, and these are the invariants I want you to enforce in safe code..." Mutexes are implemented this way.

dardie · on Dec 4, 2016

It's built into the type system and checked statically at compile time. In rust, there can only be one 'owner' of a stored value. Ownership can be transferred, borrowed by multiple readers, or borrowed by a single writer. This ensures (at compile time) that there at any given time there is only ever one single writer OR multiple readers. Rust programs are guaranteed to be free of race conditions at compile-time. The above covers the vast majority of situations, but for cases where this is too restrictive their are other language constructs such as reference-counted types.

chrisseaton · on Dec 4, 2016

Right - I get that the compiler ensures it is only accessed by one thread at a time.

But what I was asking was at runtime, when ownership transfers how does Rust ensure that writes to the value by the previous owner appear to the new owner before the transfer of ownership appears to the new owner?

You can statically guarantee that only one thread owns the object, but you can't statically guarantee the order in which the processor will apply the instructions your compiler generates, without barriers.

But the other person answered - you need to ensure that there is an explicit memory barrier yourself when you transfer the object.

bretthoerner · on Dec 4, 2016

> you need to ensure that there is an explicit memory barrier yourself when you transfer the object

To add to/clarify this: I don't think you can transfer an object to another thread in safe Rust code without a primitive that will handle the barriers for you. Static ownership tracing doesn't actually know what threads are, because it doesn't even need to.

pcwalton · on Dec 4, 2016

> But the other person answered - you need to ensure that there is an explicit memory barrier yourself when you transfer the object.

The channels do this.

staticassertion · on Dec 4, 2016

> you need to ensure that there is an explicit memory barrier yourself when you transfer the object.

To be clear, you need to write the code yourself, but the compiler won't let you transfer ownership between threads without doing so.

There is no way to compile your code without proving to the compiler that you're data race free.

Manishearth · on Dec 4, 2016

All of the standard library primitives that let you view writes from other threads (mutexes and mpsc senders) have memory barriers in them at some point.

The language itself doesn't know about memory barriers. It gives you the tools for enforcing thread safety provided your synchronization abstractions have memory barriers in the right place.

steveklabnik · on Dec 4, 2016

Rust has separate atomic types for doing atomic operations, so currently, none of them are.

So, if you wanted an atomic type, you'd find it here: https://doc.rust-lang.org/stable/std/sync/atomic/index.html

In other words, no, it shouldn't be a problem, at least in my understanding.

drfuchs · on Dec 4, 2016

Right, but the point is that to date, being careless about asking for an atomic type explicitly would not lead to buggy behavior. But now it will, so some folks are going to be stumped when they use their old habits and get flakey results. Not that there's anything to be done, other than warning folks that they can't just widen their integer types and expect it to work without checking for this case.

cesarb · on Dec 4, 2016

> Right, but the point is that to date, being careless about asking for an atomic type explicitly would not lead to buggy behavior.

That's already the case: simply reading a 64-bit integer in a 32-bit processor is not atomic, and that's already possible today.

But that's not a problem in Rust. How would you access the same 128-bit integer from more than one thread?

- You pass the thread a copy of the value: the thread's copy can't be accessed by any other thread, so whether or not it's atomic doesn't matter.

- You pass the thread a mutable reference (&mut) to the value: one of Rust's rules is that there can be only one &mut to a memory location, and while that &mut exists, nothing else can read or modify the value. Therefore it's the same case: only one thread can access the value, so whether or not it's atomic doesn't matter.

- You pass the thread a non-mutable reference (&) to the value: another of Rust's rules is that while any non-mutable reference exists, nothing can modify the value, so the reads not being atomic don't matter.

- You wrap the value in something like a Mutex: the Mutex has a lock which prevents concurrent accesses in the middle of a write.

- You use an Atomic version of the 128-bit integer: this one doesn't exist, so can't be used.

The main reason for all the interest in Rust is that the compiler protects you from many kinds of mistakes. Accessing the same variable from more than one thread, without using a lock or an atomic, is one of the things the compiler protects you from.

Tuna-Fish · on Dec 4, 2016

The situation where the atomicity of integer operations matter is when there are multiple references to a single memory location, at least one of which is mutable.

The entire basis of the memory protection model of Rust is that it makes the above impossible in safe code. If you attempt to create such a situation, the compiler will fail to compile your code. Every single integer operation could stop being atomic and not a single line of safe rust would break.

steveklabnik · on Dec 4, 2016

In Rust, you don't get buggy behavior in this kind of case: your program fails to compile.

amelius · on Dec 4, 2016

Perhaps a stupid question, but can't this be generalized to arbitrary size integers?

the_duke · on Dec 4, 2016

LLVM supports arbitrarily sized integer types, but sadly it's not exposed in Rust.

The only language I've seen where this is possible is Julia (which uses LLVM too).

Gankro · on Dec 4, 2016

Swift uses them in the standard library to do cute things like a space efficient `Int63?`, but doesn't currently expose them as a general user facility.

isoos · on Dec 4, 2016

Dart (the VM version, not the dart2js output) has arbitrarily sized integer types (`int`), as well as fixed-length ones if you need better maths support (SIMD with vector and matrix calculations).

youjiuzhifeng · on Dec 4, 2016

However, Julia is a dynamic language and Rust is a static one.

SolarNet · on Dec 4, 2016

I find your usage of the word static ambiguous. Also Julia is more static than you might expect, and besides it's still a compiled language, it just has a `dynamic` type (ala C# for example) as it's default type. It seems dynamic - like most lisps do - because it supports recompiling a function in a running session (however it doesn't change already compiled code, so not that dynamic).

m0th87 · on Dec 4, 2016

Yeah, Julia and lisps demand more nuance than static or dynamic labels. This goes into the details for Julia: http://stackoverflow.com/a/28096079

Compilation strategy and code hotswapping are independent of static vs dynamic types. Java is statically typed, however its implementations support interpreted execution and code hotswapping.

StreamBright · on Dec 4, 2016

As of? Would you mind elaborating?

shshhdhs · on Dec 5, 2016

Is there an upper maximum threshold?

pornel · on Dec 4, 2016

It could (there are 3rd party Rust libraries for arbitrary-size integers), but Rust focuses more on low-level features.

Types that have fixed size at compile time are more useful in Rust and can be optimized much better.

amelius · on Dec 4, 2016

I didn't mean dynamically sized integers. I meant at-compile-time-arbitrarily-sized-integers ;)

pornel · on Dec 4, 2016

Ah, in that case I suppose it's blocked by lack of type parametrization with integers: https://github.com/rust-lang/rfcs/issues/1038 https://github.com/ticki/rfcs/blob/pi-types/text/0000-pi-typ...

p0nce · on Dec 4, 2016

I implemented those for D: https://github.com/d-gamedev-team/gfm/blob/master/integers/g... It's nothing special when you have adhoc templates.

gkafkg8y8 · on Dec 4, 2016

It would also good to address arbitrary size floating point numbers in Rust.

Huon Wilson worked on an implementation a year ago:

https://github.com/huonw/float

Related- there was a discussion a few years back in Reddit about future-proofing math/numbers in Rust:

https://www.reddit.com/r/rust/comments/1uy7rt/an_appeal_for_...

However, when it comes to speed, working with primitive types has gotta be faster if supported natively, so anything else anytime soon will play second fiddle.

aroman · on Dec 4, 2016

What sorts of applications/domains need or benefit from having 128 bit integers?

0xcde4c3db · on Dec 4, 2016

The main operation I can think of is multiplying by a ratio without losing precision. If all your values are specified in 64-bit fixed point, a 128-bit integer type gives you something with enough bits for the intermediate product. It's not something I've really dealt with myself, but I seem to remember reading that this is really handy to have in some physics simulations and control system algorithms that do numerical integration. At any rate, it's important enough that many Forth dialects have a word for this specific operation [1].

[1] https://www.forth.com/starting-forth/5-fixed-point-arithmeti...

Someone · on Dec 4, 2016

That's less of a concern for 64-bit (even for signed arithmetic, intermediary results can be close to 1.0E19) than for the typical 16-bit values and intermediary 32-bit result that Forth was designed for.

It also is a lot less of a concern on modern systems, where floating point operations aren't much slower than integer ones.

astrobe_ · on Dec 4, 2016

Or more generally fixed point with big numbers and lots of decimals.

cesarb · on Dec 4, 2016

One example I know of is the Poly1305 MAC algorithm, which at its core multiplies two numbers and reduces the result modulo 2^130-5. The C implementation at https://github.com/floodyberry/poly1305-donna shows three ways to represent the 130-bit numbers during the calculation: as ten 13-bit limbs, as five 26-bit limbs, or as two 44-bit limbs plus one 42-bit limb. In the middle of the multiplication, before the carries are propagated, you need twice the number of bits: the first option needs 26 bits, which can be done with a 16x16->32 bit multiplication. The second option needs 52 bits, which can be done with a 32x32->64 bit multiplication. The last option, however, needs 88 bits, which won't fit in 64 bits; you need a 128-bit integer, so you can do a 64x64->128 bit multiplication.

At least on x86-64, the 64x64->128 bit multiplication is a single instruction, like the 32x32->64 bit and the 16x16->32 bit multiplications. Doing the calculation with only three limbs is clearly faster than doing it with five limbs; to start with, using 3 limbs you need 9 multiplies, while with 5 limbs you have to do 25 multiplies. The carry propagation and reduction steps also take time proportional to the number of limbs.

pkulak · on Dec 4, 2016

GUIDs are usually 128 bits. So are IPv6 addresses. Could be nice to store them in a single, native type? Probably no easier than a byte array though...

_wldu · on Dec 4, 2016

Then you could do this:

    ping6 42540577535212633203815888880477462122

Like you can do this:

    ping 2158835347
    PING 2158835347 (128.173.54.147) 56(84) bytes of data.
    64 bytes from 128.173.54.147: icmp_seq=1 ttl=52 time=20.4 ms
    64 bytes from 128.173.54.147: icmp_seq=2 ttl=52 time=20.5 ms

PeCaN · on Dec 4, 2016

Neither one is used like an integer though. As in, you don't really add GUIDs together¹, so there's no real benefit over a byte array or struct.

1. I guess having bit mask operations for IPv6 addresses could be useful.

XorNot · on Dec 4, 2016

Avoiding pointers to arrays is nice.

This is the big annoyance in Golang with it's net.IP type - it's `type IP []byte`, which means you can write ip1 == ip2 and you can't pass by value easily, nor use it as a map key.

I've ended up inventing my own type for that a lot of the time as a struct wit static fields, since those you can copy around and do 1:1 comparisons.

Though to be honest I'd be super happy if there was a drop-in varint type, or you could trivially have the compiler calculate instructions for a arbitrary fixed size ints.

dbaupp · on Dec 4, 2016

Rust offers non-allocating fixed-sized arrays, so a GUID could be a (wrapper around) [u8; 16] and an IPv6 address could be [u16; 8].

steveklabnik · on Dec 4, 2016

That's exactly how the UUID crate implements them: https://github.com/rust-lang-nursery/uuid/blob/master/src/li...

barsonme · on Dec 4, 2016

So does Go, so a net.IP could be [16]byte or [4]byte, but I'm sure you can see the obvious problems that might occur there (with having two separate IP types).

Most UUID libraries I've seen and written use [16]byte as the concrete UUID type.

dbaupp · on Dec 4, 2016

I don't see the obvious problem (fwiw, I really dislike such rhetorical devices): presumably net.IP could a struct that contains a [16]byte and an isV4 flag.

barsonme · on Dec 4, 2016

The problem is [16]byte would always be wasting 12 bytes for IPv4 addresses if you only used one type for both. So, two types must be made which requires extra code and doesn't allow for == comparison which is what OP (GP? whomever...) was complaining about not being able to do since net.IP is []byte.

Sorry for the rhetorical device.

ben0x539 · on Dec 4, 2016

Isn't the indirection overhead of a []byte more expensive than a spare 12 inline bytes?

dbaupp · on Dec 4, 2016

Yes, and a []byte is even going to have at least as much inline overhead: it stores 24 bytes (length, capacity, pointer), and of course plus the 4 bytes of actual data. Even on a 32-bit platform, the []byte inline storage is 12 bytes.

int_19h · on Dec 4, 2016

I think the point was that when it's a mere alias, you can end up passing an array of the right size and type that is not an IP address.

LeonidBugaev · on Dec 4, 2016

If you are using default library IPv4 addresses stores as [16]byte: https://golang.org/src/net/ip.go

daenney · on Dec 4, 2016

I don't think that's what the Godoc says. The IP type is defined as []byte

  type IP []byte

That is not the same as [16]byte.

  // Note that in this documentation, referring to an
  // IP address as an IPv4 address or an IPv6 address
  // is a semantic property of the address, not just the
  // length of the byte slice: a 16-byte slice can still
  // be an IPv4 address.

What I believe that comment is saying is that something that is [16]byte can still be an IPv4 address, but that doesn't mean that all IPv4 addresses are stored as [16]byte. At least that would be my interpretation based on the comment above it:

  // An IP is a single IP address, a slice of bytes.
  // Functions in this package accept either 4-byte (IPv4)
  // or 16-byte (IPv6) slices as input.

eropple · on Dec 4, 2016

IP addresses are used like integers all the time. Ask your friendly neighborhood sysadmin how a network mask works.

(Maybe you want to foreach over an array every time you want to apply one. I'd rather not.)

CuriousSkeptic · on Dec 4, 2016

I realize integer has rather specific meaning in this context. But really, your comment just highlights the issue.

In my world integer is a mathematical construct with no particular representation, making things like bitmasks and or shifts nonsensical.

If you really want to work with fixed length bitstrings why not just have a type for that? Operating on a string of 128-bits should be valid on all such bitstrings no matter wether those represent a number or a string of code points.

And equally operations on integers should not care about particular bitstrings representations of the number in question.

eropple · on Dec 4, 2016

> In my world integer is a mathematical construct with no particular representation

Your world doesn't map to the reality of silicon and registers, whereas Rust does. As it happens, you can be fixed much more easily than the whole of modern computing.

> If you really want to work with fixed length bitstrings why not just have a type for that?

I don't. I want to work with integers. An IPv6 address is not the hex format that you read--it is a 128-bit integer. You can go read RFC 2460 if you don't believe me, but it's true. It is an integer that I can add and subtract from; I don't add 1 to an octet of an IP address and then do a bunch of carries if I want the next IP address in my network, I add 1 to the IP address. I don't perform some magic operation to determine what a subnet looks like, I bitand the integer. They are inescapably based on the representation used both by my computer and by my network hardware. (As is the performance of both my network hardware and yours. There's a reason that your router doesn't use BCD or whatever.)

There are programming languages that do not represent the underlying system. They are, for the most part, bad at dealing with the kinds of problems Rust is tailored to effectively represent. You can use those. It's pretty presumptuous to suggest that languages designed for lower-level problems accommodate your peculiarity.

fnj · on Dec 4, 2016

Not really. They are never added, subtracted, multiplied, or divided. They are never compared for greater-than or less-than. Possibly they are compared for equality.

About the only mathematical operations I can think of which are ever done to them are bitwise-anding, bitwise-inclusive-oring, and testing for zero (and, as mentioned, equality).

eropple · on Dec 4, 2016

Yes, really. I mean, I add to IP addresses regularly. How else do you enumerate a subnet (like, what do you think nmap does)? How else do you place, say, twelve machines at 172.17.0.120 through .131? If you're doing string concatenation, you are screwing up. Perhaps it's not enough of a screwup to impact what you do--but definitely wrong enough to impact what I do unless I were to do a lot of extra work in modeling an IP address just for the sake of somebody else's spherical cow of uniform density. (And I do mean work I'd have to do--the lack of a decent IP addressing library on the JVM is a recurring pain in my ass, but since they're just integers...)

If you have internalized an IP address as a dotted set of octets expressed in ASCII digits, that's a problem of comprehension. When you don't, it's pretty natural to just use this stuff like any other integer.

baq · on Dec 4, 2016

You're not adding addresses together, that doesn't make sense. You're adding ints to addresses, it's a completely different property, even if the representation happens to fit into the same amount of bits. Nothing more than a convenient coincidence.

eropple · on Dec 4, 2016

You're right, I'm not adding addresses together. If you read my post, I say I'm adding to addresses, which you do to--now hold onto your hat--find the next one in sequence. And that, quite obviously, works because they are integers and map to an address space that matches the complete space, from 0x00000000 to 0xFFFFFFFF (and the equivalent 128-bit space for IPv6) of their bit length. It's not "a convenient coincidence". It's the definition of the thing in question.

It's like saying a pointer to RAM isn't an integer. Of course it is, and you add to them every time you de-reference an array in C. That it has additional semantic meaning doesn't mean it stops being a integer. The pedantry you're peddling doesn't fly.

FreeFull · on Dec 4, 2016

I think what baq is saying is that IP addresses are torsors http://math.ucr.edu/home/baez/torsors.html . It makes sense to give them a type where you can't add two IP addresses to each other (but you can still subtract to get an integer, and add integers).

eropple · on Dec 4, 2016

Sure, fine, but it'll still be an i128 somewhere, yeah? Like, you can put whatever abstraction you want on it, I still need to directly poke those bits and treat it in all ways as fundamentally the same thing as an integer, defined in those terms in order to use integral operations upon them, at the levels that somebody who's worrying about them has to care about. (Also, look at halomru's post for reasons one might intentionally add and subtract addresses from one another in the first place, which kind of holes this argument.)

That is a cool link though, I'm gonna give it a deeper read.

halomru · on Dec 4, 2016

>They are never compared for greater-than or less-than.

    if ip >= IP(10.34.12.3) && ip <= IP(10.34.12.9)
        //is one of our database servers

> They are never added, subtracted,

    if abs(atk1.ip - atk2.ip) < 16
        //same source likely, modify threshold

Netmask are useful, but sometimes doing regular math is a better fit for your problem. Adding an integer to an IP, subtracting IPs or comparing IPs all yield meaningful results.

hroi · on Dec 4, 2016

What is your point? u32/u128 are not mathematical integers. They are actually bit strings, just like IP addresses and unlike mathematical integers.

Ericson2314 · on Dec 4, 2016

They are mathematical cyclic groups!

drfuchs · on Dec 4, 2016

But you do want to compare GUIDs. And hash them, which can involve shifting, adding, xor'ing, multiplying, remaindering, etc.

Gankro · on Dec 4, 2016

If you can't already hash/compare an array of bytes, something terrible has happened with your language design.

drfuchs · on Dec 4, 2016

Not in a way that complies into one or two machine instructions. So, your array code is an order of magnitude or two too slow. Never mind having to write it at all.

Added: I'm talking about implementing specific, optimized hash functions on 128-bit values (e.g. GUIDs, IPv6 addresses), and not generic hash functions that can take any length input (although many of them speed up linearly in the size of int that gets used internally).

Gankro · on Dec 4, 2016

Rust isn't hashcode based. You don't define how to hash a type in that way. You define hashing algorithms which operate on arrays of bytes. All a type has to do to implement hashing is specify how to feed its bytes into a hasher.

Hashers provide conveniences for feeding in a u8, u16, u32, u64, etc., but most hashers just implement this as casting the value to an array of bytes and using the generic implementation. This is because most algorithms are defined in terms of bytes. Evidently there isn't any interesting optimization to do by statically knowing you have a u32 vs a u64 for these algorithms (appears to be the case for SipHash, XXHash, and Fnv).

And indeed, u128 continues this tradition in the PR: https://github.com/rust-lang/rust/pull/37900/files#diff-2327...

dbaupp · on Dec 4, 2016

Whatever one can do on an i128, one can do "manually" on an array of bytes, e.g. coerce to a pair of u64s to avoid having to manipulate each byte individually.

rickycook · on Dec 4, 2016

well, you can already do operations on 128bit of data that represents and int as well, but that's a kind of silly thing to say because higher level abstractions are just about making things easier. so now you can do it in 1 operation rather than several; if that's all you gain, then it's still worth it

drfuchs · on Dec 4, 2016

Implementing cryptographic hash functions like SHA-3 (see the original Kaccek paper for specific results under various 32-bit, 64-bit, and 128-bit native integer hardware support). And implementing other, faster but non-cryptographic hash functions like Spooky, etc.

speleo_engr · on Dec 4, 2016

Two applications off the top of my head: Reed-Solomon encoding and bigint math.

gravypod · on Dec 4, 2016

Finance and scientific computing.

foota · on Dec 4, 2016

Is this for implementing faster high precision floating point? It didn't mention 128 bit float.

gravypod · on Dec 4, 2016

For finance probably but it's mostly useful for statistics from what I understand.

I know fortran and Cobol code that banks use and is used for science (tm) often defines numbers this large for certain operations.

One such i can think of is a map reduce of large datasets. Let's say you want to find as a result a huge sum. 2^128 is a bit bigger then 2^64 and that difference may be big enough to provide the computation needed for getting to Mars rather then getting to the moon on 2^8 machines.

foota · on Dec 4, 2016

Fun fact: there are ~2^50 millimetres from here to mars

Keyframe · on Dec 4, 2016

128i[3] would be enough for 390x observable universe discrete grid in Euclidean space with nanometer precision. Or 0.39 observable universe with picometer resolution.

spladug · on Dec 4, 2016

There are? It looks closer to 2^14. 2^50mm is around 7.5 au. Sorry to nitpick, but I thought this was a super interesting point!

foota · on Dec 4, 2016

I rounded a bit over-zealously, thanks for nit picking. It is indeed 2^47.6333. (also assuming you meant 47 not 14.)

spladug · on Dec 5, 2016

Well, that's embarrassing. 2^47 it is! :)

Dylan16807 · on Dec 4, 2016

I assume you meant to type 47 rather than 14?

leereeves · on Dec 4, 2016

What number in finance is bigger than 2^64?

wtallis · on Dec 4, 2016

The sum of the whole planet's annual GDPs, expressed as US dollars with a precision of cents, requires about 53 bits. Choose a smaller unit or start considering long time spans, and you can easily get uncomfortably close to overflow.

leereeves · on Dec 4, 2016

Why would you express GDPs, which are estimated to an accuracy of millions of US dollars, with a precision of US cents?

wott · on Dec 4, 2016

Riches are getting richer.

justincormack · on Dec 4, 2016

POWER9 has hardware 128 bit float, so likely to start becoming relevant soon.

calanya · on Dec 4, 2016

Measuring time in picoseconds. The current unix timestamp is roughly 2^70 picoseconds, so won't fit in 64 bits.

fnj · on Dec 4, 2016

Can you present any imaginable situation where resolution of arbitrary time/dates to a billionth of a second is not adequate?

adrianN · on Dec 4, 2016

Something involving femtosecond lasers maybe?

baq · on Dec 4, 2016

MichaelBurge · on Dec 4, 2016

You can store a bitboard for western chess perfectly with 64-bit integers. Some other chess variants(like Chinese Chess, or Shogi) need 128 bits for the same techniques, since the board is larger.

I haven't used it, but Julia lets you declare any(?) fixed size bitset. Which would probably come in handy for Go.

ycmbntrthrwaway · on Dec 4, 2016

NS-3 network simulator uses 128bit integers to measure time and they have 3 implementations (search page for int64x64): http://code.nsnam.org/ns-3-dev/file/38d46996c708/src/core/mo...

It would be much easier for them to just build on top of i128.

Also, you can see that one of the implementations is copied from Cairo, so it seems graphics libraries have some use for 128-bit integers too.

runeks · on Dec 6, 2016

I found myself asking the same question. 64 bit integers are useful because you'll never fill up an IntMap with a uint64 key, and 256 bit integers are useful because they offer sufficient (128 bit) collision resistance. In my mind, a uint128 is too big for use as an index, and too small as a GUID.

baby · on Dec 4, 2016

AES blocks!

loeg · on Dec 4, 2016

(Some) NVMe statistics are specified as 128-bit integers. Not that you need to operate on them, necessarily, or that they'll practically ever exceed 64-bit values, but they are.

Ericson2314 · on Dec 4, 2016

128-bit atomics for small structs—very convenient

mambodog · on Dec 4, 2016

can be handy for a really big space for ids, especially if you don't want to use strings. also bigger bitsets

qwertyuiop924 · on Dec 4, 2016

...PS2 hacking?

But I don't think rust has support for the Emotion Engine in any case.

searealist · on Dec 4, 2016

Why was this feature accepted whereas 128 bit floats were removed from Rust?

steveklabnik · on Dec 4, 2016

128 bit floats were removed years ago (June of 2014), here's the meeting notes from the time: https://github.com/rust-lang/meeting-minutes/blob/master/wee... (as with all old Rust stuff, please remember that details very much might have changed between then and now.)

In other words, they weren't removed because we fundamentally didn't want them. They were removed because of maintainability, usability, and usefulness concerns.

The RFC with justifications for why this was added (and all the related discussion) is already linked elsewhere in this thread. The same could still happen with f128 in the future.

kevhito · on Dec 4, 2016

I wasn't able to find any real justification beyond (a) one vague mention of "some algorithms... such as certain cryptographic algorithms" and (b) the fact that clang supports it. Can you point to something more specific? Justification (b) in particular smells bad to me.

kevhito · on Dec 4, 2016

Oh, just found this:

> The Duration type could be simplified with this: instead of using a u64 for seconds with a seperate u32 for nanoseconds, it could just be a single u128/i128 count of nanoseconds.

Of course, just a single u64 at ns precision would get you 500+ years of range. But ok.

eropple · on Dec 4, 2016

Five hundred years and eighty-four years? Recorded human history is almost ten times that range and history to which we can get precise dates is nearly five times that range. There are financial debts being paid, today, that exist from nearly the limit of that range (Dutch debts dating from 1624) even if you left a mere quarter of it for the future. So why are you middlebrowing at this?

Veedrac · on Dec 4, 2016

To be fair, you basically never need scale and precision at the same time. Debts from 400 years ago aren't known down to the nanosecond.

halomru · on Dec 4, 2016

You usually don't need scale and precision at the same time, but a lot of algorithms get a lot simpler if you don't have to worry about rounding errors that get worse with every operation.

Floating point errors are hard to reason about and often makes equality a very fuzzy concept. If your not starved for bandwidth or memory, large fixed precision numbers are incredibly useful.

Veedrac · on Dec 4, 2016

Although a large, fixed precision type is probably an easier default, in practice a floating point variable will work at least as well as a 53 bit fixed integer scaled to the range you're interested in. Inexactness through rounding isn't a big deal, because you rarely care about exact equality for inexact measures like time.

eropple · on Dec 4, 2016

I can think of astronomical measures from a couple hundred years ago that are at least second-accurate. Do you want to have DurationSecondsOnly and Duration as types? =)

Veedrac · on Dec 4, 2016

It's more an argument for floating point.

TillE · on Dec 4, 2016

Floating point is awful if you care about precision.

Veedrac · on Dec 4, 2016

What scenario would you need more than 53 bits of precision in a time variable for? Even eropple's somewhat excessive subsecond-accurate timing for an event several hundred years ago has plenty of room to spare with 53 bits.

MaulingMonkey · on Dec 4, 2016

"Absolute" timestamps. If you're using a double to represent, say, epoch time (technically relative to the relatively recent 1/1/1900), your precision has dropped off to about half a microsecond or so. I read GPU profiling results in nanoseconds.

You can work around the problem - time since program start, time since capture start, etc. - of course, this can lead to fun edge cases, where e.g. Windows 95/98 crashed due to a 32-bit milliseconds timer rollover after around 50 days. For comparison, pow(2,53) nanoseconds is only a little over 100 days. Of course, a floating point value won't roll over in quite the same way, but...

The problem is surmountable if you throw enough edge case handling at the problem. Force the devs to be vigilant about choosing the necessary precision, the proper time to measure relative to, for each possible application of anything time related, etc...

... or you could just throw more bits at the problem, and suddenly my accounting software can accurately calculate compound interest on both a 400 year old debt, and a 49 nanosecond debt clearing HFT trades, if that's your kind of thing - without as many edge cases to worry about.

EDIT: Use pow(x,y) formatting since HN collapses x double-star y to simply xy ...

speleo_engr · on Dec 6, 2016

Here's another example like your Windows rollover - timing rollover caused the loss of the Deep Impact spacecraft.

"On September 20, 2013, NASA abandoned further attempts to contact the craft.[76] According to A'Hearn,[77] the most probable reason of software malfunction was a Y2K-like problem (at August 11, 2013, 00:38:49, it was pow(2,32) of one-tenth seconds from January 1, 2000)"

https://en.wikipedia.org/wiki/Deep_Impact_(spacecraft)

Veedrac · on Dec 4, 2016

Yeah, sorry, that was a bit of a knee-jerk response from me. I took TillE's comment to be more a statement of the general (IMO unfounded) mistrust of floating point, and my comment was intended to take it in that light. Once you have values in the domain you're interested in, floating point more than suffices for precision, and generally handles rounding better than the equivalently-sized fixed point variable.

I do agree that a general timestamp needs to be domain-agnostic, and I'm certainly not saying Rust should use it, not least because Rust aims to preserve the semantics of underlying APIs.

lifthrasiir · on Dec 4, 2016

It is closely related to "native" time representation, where you can frequently have u64 seconds in the API. Being able to represent them without a trap would be beneficial.

dom0 · on Dec 4, 2016

64 bit ns timestamps are ok if you are only recording current events, but are completely insufficient as a general purpose timestamp.

steveklabnik · on Dec 4, 2016

I wasn't involved in the discussion really, so I don't have anything off-hand.

I can see why (b) might smell a bit bad, but think of it this way: we share a backend with clang, and so if they think support is mature enough to ship, then that's a very positive sign.

fulafel · on Dec 4, 2016

The x86-64 64x64->128 instructions.

pettou · on Dec 4, 2016

When you say "details very much might have changed between then and now", are there any news on bringing in the float128 type? I mean, do you happen to know whether there is any reasonable chance of having it in Rust at all?

steveklabnik · on Dec 4, 2016

Nobody since then has championed it, so there's been no progress. Because of that, I'm not aware of how much effort it would be to add; the compiler has changed a lot since those days. That might make it easier or harder.

frozenport · on Dec 4, 2016

As a side note, what are some use cases for 128 floats?

petters · on Dec 4, 2016

Some numerically unstable problem require a lot of precision of solve successfully. Similarily, well-behaved problems can be solve with 32-bit floats, but many require 64-bit floats.

frozenport · on Dec 4, 2016

So, then 64 bits is enough ( because the problems are being solved at present under 64 buts)? From personal experience, numerical ill behaved problems don't benifited from more precision, rather they benifited from being scaled to be numerically stable. Things like trust parameters, or preconditioners like SSOR.

santaclaus · on Dec 4, 2016

Interior point methods?

ant6n · on Dec 4, 2016

how about bit look up tables.

ant6n · on Dec 4, 2016

For whoever downvoted, the issues of u128 types came up in the discussion for this function:

     fn is_token(c: u8) -> bool

http://kamalmarhubi.com/blog/2015/09/15/eliminating-branches...

bluejekyll · on Dec 4, 2016

Sweet! IPv6 perfection. besides GPUs, does anyone have experience with any processors that support 128 natively?

gravypod · on Dec 4, 2016

Technically anyone with a modern Intel CPU has access to the SIMD instructions needed for using 128 bit registers.

We probably won't see larger then 64bit bus/address access for some time as we don't need it currently.

Const-me · on Dec 4, 2016

PCs have a set of 128-bit registers for like 15 years, both Intel and AMD.

For integer/fixed point math, the functionality is here since SSE2: https://en.wikipedia.org/wiki/SSE2

bluejekyll · on Dec 4, 2016

I've not had a chance to do more than rely on my compiler to target those. I guess I should play with them more.

Const-me · on Dec 5, 2016

A couple of advices for you then.

Unless you’re planning to do reverse engineering, or planning to work on a compiler, learning assembler is mostly pointless. In most cases, real-world algorithms contain both vector code in the inner loops, and scalar code everywhere else. When coding assembler you need to use it for both, and assembler ain’t exactly user-friendly. Using C or C++ language with SSE intrinsics is the way to go. All modern compilers support them.

Intel’s documentation is the best so far, but there’s no offline searchable version. I’ve created one: https://github.com/Const-me/IntelIntrinsics/releases

Memory layout is a king. You need to keep the input and output data SSE-friendly: aligned, dense, sequential access patterns are preferred. This could mean you need to [re]design some parts of your software specifically for SSE.

There’re multiple generations of hardware. When writing manually-vectorized code, the compiler won’t tell you what CPUs it’ll run on. SSE2 is the most compatible. Here’s some statistics about Windows users: http://store.steampowered.com/hwsurvey/ click on “Other settings”

glhf

justincormack · on Dec 4, 2016

Risc-V has a 128 bit design (as well as 32 and 64), although no one is currently implementing it.

qwertyuiop924 · on Dec 4, 2016

...And once we get Emotion Engine support, Rust will at long last be usable for PS2 hacking.

foota · on Dec 4, 2016

A fun rule of thumb is that 1 decimal digit is roughly 3 binary digits (because 2^3 is eight, which is almost ten.)

lorenzhs · on Dec 4, 2016

That's terribly far off. On the other hand, 2^10 and 10^3 are reasonably close at 1024 and 1000. That's good enough for many things. But it's also the difference between a Kibibyte and a Kilobyte, and so forth.

paulddraper · on Dec 4, 2016

> terribly far off

lg 10 is 3.322

3 is ten percent accurate. 3.333 is one percent accurate.

Tempest1981 · on Dec 4, 2016

I've used this many times -- handy for order-of-magnitude sanity checks. Sure, log2(10) is 3.32 or so, but it's still a quick way to remember that 32-bits is good for a billion, and 64-bits is good for a quintillion (ok, 9 quintillion unsigned)

IshKebab · on Dec 4, 2016

Isn't it easier to remember 10 bits ~= 1000? 32 is about 310, or 1000 1000 * 1000.

Keyframe · on Dec 4, 2016

Not really. BCDs are (can be) four bits per decimal digit. That's something different though.

partycoder · on Dec 4, 2016

If you like this subject take a look at gmplib (gnu multiple precision big number library)

cyphreak · on Dec 4, 2016

[flagged]

computerphage · on Dec 4, 2016

The bug depended on iterating through every item in a hashtable and putting it into a new hashtable AND the new hashtable had to not start with a known capacity. That's a pretty specific edge case, IMO. Plus it's been addressed.

"rumored memory issues" just sounds like FUD.

foota · on Dec 4, 2016

Memory issues? And a lot of implementations of hash functions turn quadratic. I seem to remember php having quadratic hash table performance. And the bug in rust's code required a very specific usage pattern.

edit: hash table vuln article (2011): http://www.securityweek.com/hash-table-collision-attacks-cou...

steveklabnik · on Dec 4, 2016

> After the quadratic hashes

Reference: http://accidentallyquadratic.tumblr.com/post/153545455987/ru...

(bugs happen, fix is in the queue, life goes on)

> rumored memory issues

Rumors are just rumors. If you (or anyone else) has more than FUD here, please email https://www.rust-lang.org/security.html .

kbd · on Dec 4, 2016

Thanks for the link to the explanation of the hash table bug!

> One proposal is that hash tables ought to retain their insertion order...

This seems to be something that a lot of hash table implementations are converging on:

https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

Do you happen to have a link to the discussion for the Rust proposal?

tatterdemalion · on Dec 4, 2016

Here's the discussion about this problem on Rust's internals forum:

https://internals.rust-lang.org/t/help-harden-hashmap-in-lib...

steveklabnik · on Dec 4, 2016

I don't happen to have a link handy, no. I'm not even sure if it's a formal proposal or something someone happened to suggest one time, off the top of my head.

cyphreak · on Dec 4, 2016

[flagged]

loeg · on Dec 4, 2016

> You've threatened to kill people on twitter

What?

cyphreak · on Dec 4, 2016

[flagged]

me551ah · on Dec 4, 2016

Do you have any proof? These seem to be pretty big accusations.

steveklabnik · on Dec 4, 2016

Last time this troll popped up, [1] they claimed that I told people to kill themselves. Which I absolutely do not. Now apparently it's threatening to kill other people, which I most certainly have not.

They claimed that I have deleted said tweets so that people wouldn't see them (which also doesn't make much sense) and that they didn't grab a screenshot (which also wouldn't make sense given that it's easy to edit the HTML of a webpage and take a screenshot.)

So uh, yeah.

1: https://news.ycombinator.com/item?id=12970148

steveklabnik · on Dec 4, 2016

I can't believe that I (apparently) have to say this, but I

1. Would certainly not threaten to kill anyone on twitter.

2. Would certainly not threaten to kill someone for reporting a security vulnerability in Rust.

In fact, as I said above, I would prefer that people report them, so that everyone using Rust can be safe from vulnerabilities.

latrasis · on Dec 4, 2016

Just want to say I really admire your work. It's sad to see people trolling like this.

steveklabnik · on Dec 4, 2016

Thanks. It's pretty much just a part of my life at this point.

sctb · on Dec 4, 2016

We've banned this serial troll, of course, but the immune response from the community was particularly healthy here.

steveklabnik · on Dec 4, 2016

Hey, I appreciate it. Sorry that apparently some people hate me personally so much it causes an issue. I'll keep it to only one reply in the future.