Skip to main content

The Ugly Duckling in factoring aka the filtering steps part I

People that knows me well are well aware that prime numbers have been my obsession since my childhood and they are source of continue interest for me. Actually thanks to cryptography they are a relevant part of my everyday life.
One of the most important problem in cryptography since the discovery of RSA is factoring.
The factoring problem consists of finding the prime numbers p and q given a large number , N = p x q.
Unless you are still convinced that factoring is an easy peasy problem, you should know that, while probably not NP-complete, factoring is indeed reaaally hard.
The faster known method for factoring is currently NFS (Number Field Sieve) and if you are interested in the topic I suggest you to read  this beautiful article from the great Carl Pomerance titled "A Tale of Two Sieves" . But it is not what I wanted to talk about today, mainly because the complete algorithm and all its shades go well beyond my current knowledge.
Today instead I want to talk you about one of the IMHO too often underappreciated (but crucial) steps of NFS that goes under the name of filtering. I became more familiar about this topic while reading Topics in Computational Number Theory Inspired by Peter L. Montgomery where I discovered that the legendary Peter L. Montgomery played a big role in the concrete development of these techniques.
One of the most intriguing part of filtering is that at the first sight these methods look trivial (because well, they are) but my main point, that drove me to write this blog post, is that they are probably one of the most important part of the all  NFS algorithm.
The current NFS record factorization is RSA-768, an integer of 768 bits (232 digits). At the beginning of the filtering step, the matrix had about 47 billion rows and 35 billion columns. After the first part of the filtering step, the matrix had about 2.5 billion rows and 1.7 billions columns. At the end of the filtering step, the matrix used in the linear algebra had about 193 million rows and columns. So the filtering step reduced the size of the matrix by more than 99%!!!
But let's go in order...

..What is this filtering about? 

All the modern factorization procedures consist basically of 3 steps:
  1. Relation Building 
  2. Elimination 
  3. GCD Computation 
Filtering is part of step 2 (Elimination). The main goal of the filtering step is to reduce the size of a large sparse matrix over a finite field in order to being able to compute its kernel (we will see that for the factorization problem what we actually need is the left kernel). We will see that filtering per see is a 4 steps process:
  1. Removing duplicates 
  2. Removing singletons 
  3. Removing cliques 
  4. Merging 
As we said in this example we focus on filtering that is part of Elimination so in order to showcase it, we borrow a simple Relation Building example from "An Introduction to Mathematical Cryptography"

from "An Introduction to Mathematical Cryptography"

Again, without going to much in details, the goal here is to find a subset of all these relations whose product is a square on each side of the equality.  This is equivalent to: the sum of the exponents in all chosen relations must be even. If you wonder why, we are trying to leverage one of the simplest of the identities in all of the mathematics:

X^2 - Y^2 = (X+Y)(X-Y)

But let's not diverge from out today's topic and let's focus on filtering. The problem above can be translated in a linear algebra problem. If one sees the relations as rows of a matrix where each column corresponds to an ideal, the coefficients of the matrix are the exponents of the ideals in the relations. As we are looking for even exponents, one can consider the matrix over GF(2).

Finding a linear combination of relations such as every exponent is even is equivalent to
computing the left kernel of the matrix. So lets translate A to B = A'


As we mentioned before, the matrix that comes out from factoring problem is huge (in the order of billions of row/columns). Our goal is to minimize the size of the matrix before the kernel computation in order to make this operation possible. So let's filter

Removing duplicates

The first step of filtering is extremely trivial. It is simply about removing the duplicate rows in the matrix. As weird at it sounds having those rows is inevitable though in the factorization context (because lattice sieving with many distinct special q-primes will produce identical relations).

Removing singletons 

The rule of removing singleton is as simple as  the removing duplicates above: if there is a row in B that contains only a single entry that is non-zero module 2 them the column containing non-zero entry can not occur in a dependency. Such column, called singleton, can be removed from B (together with the respective row). In order to give an example let's use a simpler matrix then B above called M

Now let's apply the removing singleton filter:

So we removed the first row and column of M. But we are not done yet as you can see this removal generated a new singleton (the first column of the new matrix) so several passes are normally requited before all singleton are normally removed from M. Continuing until the very end may not be worth the effort though (we will see in the next post how to handle this part).

Conclusion

Filtering is a really important step in Number Field Sieve and it is implemented also in important integer factorizatiion tools as  CADO-NFS, Msieve and GGNFS. In this blog post we covered the first to phases of filtering as removing duplicates and removing singleton. Stay tuned for the last two steps removing cliques and merging coming in part II of the blog post series.

For more crypto goodies, follow me on Twitter.



Comments

Unknown said…
Factoring in cryptography is a really passionate topic. Thank you very much for the post. I work in another side of factoring the business area!

Popular posts from this blog

OpenSSL Key Recovery Attack on DH small subgroups (CVE-2016-0701)

Usual Mandatory Disclaimer: IANAC (I am not a cryptographer) so I might likely end up writing a bunch of mistakes in this blog post...

tl;dr The OpenSSL 1.0.2 releases suffer from a Key Recovery Attack on DH small subgroups. This issue got assigned CVE-2016-0701 with a severity of High and OpenSSL 1.0.2 users should upgrade to 1.0.2f. If an application is using DH configured with parameters based on primes that are not "safe" or not Lim-Lee (as the one in RFC 5114) and either Static DH ciphersuites are used or DHE ciphersuites with the default OpenSSL configuration (in particular SSL_OP_SINGLE_DH_USE is not set) then is vulnerable to this attack.  It is believed that many popular applications (e.g. Apache mod_ssl) do set the  SSL_OP_SINGLE_DH_USE option and would therefore not be at risk (for DHE ciphersuites), they still might be for Static DH ciphersuites.
Introduction So if you are still here it means you wanna know more. And here is the thing. In my last blog post I was …

All your Paypal OAuth tokens belong to me - localhost for the win

tl;dr  I was able to hijack the OAuth tokens of EVERYPaypal OAuth application with a really simple trick.
Introduction If you have been following this blog you might have got tired of how many times  I have stressed out the importance of the redirect_uri parameter in the OAuth flow.
This simple parameter might be source of many headaches for any maintainer of OAuth installations being it a client or a server.
Accepting the risk of repeating myself here is two simple suggestions that may help you stay away from troubles (you can always skip this part and going directly to the Paypal Vulnerability section):
If you are building an OAuth client,   Thou shall register a redirect_uri as much as specific as you can
i.e. if your OAuth client callback is https://yourouauthclient.com/oauth/oauthprovider/callback then

DO register https://yourouauthclient.com/oauth/oauthprovider/callbackNOT JUST https://yourouauthclient.com/ or https://yourouauthclient.com/oauth If you are still not convinced here…

Critical vulnerability in JSON Web Encryption (JWE) - RFC 7516

tl;dr if you are using go-jose, node-jose, jose2go, Nimbus JOSE+JWT or jose4j with ECDH-ES please update to the latest version. RFC 7516 aka JSON Web Encryption (JWE) hence many software libraries implementing this specification used to suffer from a classic Invalid Curve Attack. This would allow an attacker to completely recover the secret key of a party using JWE with Key Agreement with Elliptic Curve Diffie-Hellman Ephemeral Static (ECDH-ES), where the sender could extract receiver’s private key.

Premise
In this blog post I assume you are already knowledgeable about elliptic curves and their use in cryptography. If not Nick Sullivan's A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography or Andrea Corbellini's series Elliptic Curve Cryptography: finite fields and discrete logarithms are great starting points. Then if you further want to climb the elliptic learning curve including the related attacks you might also want to visit https://safecurves.cr.yp.to…