When we realized that we needed to better understand GNFS, so turned to Johan Håstad and asked him to teach us. It then quickly became apparent to us that GNFS is a very advanced algorithm--it would take us a very long time to implement it ourselves. So, of course we started to investigate the possibility of using the same code that had been used for CWI's record factorization of RSA-155 some months earlier. Fortunately, Nada at KTH where three of us worked had access to CWI's GNFS implementation. Their source code comes with the following warning:
The code is a research product which is reasonably documented, but it is very advisable to know the (complicated) Number Field Sieve factoring method, in order to use the code successfully.
Indeed, we were quite confused after receiving the code. We started out trying to run the different programs included in the program suite, and eventually managed to factorize a 17-digit number after a full afternoon in Johan's office. We alternately read articles about GNFS, looked at the source code, and tried to factorize larger and larger numbers for well over a month. We also ran the first step of the algorithm, the polynomial finder, during this time. The GNFS algorithm consists of the following steps.
The parameters in the second step should be chosen so that it is easy to find many relations, but the number of relations that we need also depends on the parameters. Since we did not have much intuition here, we tried varying the parameters until we thought we could finish the sieving step in reasonable time. Actually, it seems that we did quite a poor job since we used almost twice the computing power compared to what CWI used when breaking RSA-155.
The search for relations is carried out by sieving. This task can easily be shared between many computers, where each one checks a different range of a and b values for possible relations. To coordinate the hundreds of computers we had access to at Nada, we modified the client/server solution used for Stage 9. Each task took the client machines a few hours to complete. Since the siever uses over 100 megabytes of memory, we did not want the program to run on computers that were in use, so the program stopped its computation whenever somebody logged on to the machine.
On August 20, we had obtained 75 million relations, which we suspected might be too few, but nonetheless we moved on to processing (``filtering'') our relations to see how far we had progressed. We were positively surprised in that after the first filtering stage, were we threw away relations that had singleton ideals, there were 37 million relations and 35 million ideals. This is the same as having an linear equation system with 37 million equations and 35 million unknown variables, which implies there are 2 million dependencies between the equations. We had enough relations!
For a successful factorization, one needs just a few dozens of dependencies, so we could just throw away most of the excessive relations. But solving an equation system with 35 million variables--even a very sparse one like this--is a computationally unsurmountable task, so further filtering was necessary. With 2 million redundant equations, we should be able to drastically reduce the equation system size.
It is important to understand that we are looking for an equation - = ( + )( - ). We really only care about if the multiplicity of an ideal is even or odd--if we can create an equation with even multiplicity for all ideals, we have a dependency that might lead to a factorization. We can therefore compute over GF(2).
The second filter pass means combining sets of relations that have an ideal in common. If we combine all relations for such an ideal, that ideal is reduced from the equation system, and we also get fewer relations, although of higher ideal weight. But since we had 2 million redundant equations, we could here simply throw away those equations that became very heavy!
Filtering is an iterative process, where one tries to select parameters that reduce the equation system as much as possible. We finally had about 8.4 million equations and variables and a total matrix weight of about 500,000,000.
The filtering was performed on a dual processor Alpha DS20 with 4GB main memory. This is an extremely powerful system, but filtering is a daunting task even for this machine. It took about 3 weeks before we had achieved results that we were not able to further improve.
During the long filter computer runs, we started to think about how long it would take to solve the linear equation system, and if we could improve the program. The estimated running time to solve this system on the Alpha was 150 days. Simply too much. We could not afford such a long computer run, and although the computer center at UMS Medicis was very generous, we felt that we could not occupy their most powerful machine for such a long period.
The program we used for this was optimized for running on vector computers, which is what CWI used for their record factorization last year; it had taken them 10 days to solve a slightly smaller equation system on a 16-processor Cray C90. We started to rewrite this program so that it would run better on the hardware available for us, and after a few weeks of hard work, we had a version of the program that used both processors of the computer, and would have to run for 37 days.
At this point, we contacted Compaq to get some help for the computation, since we thought this was great opportunity to show how powerful the Alpha is for scientific computations. Compaq generously let us use one of their quad processor ES40 systems. The total running time on this machine was 13 days, which is almost as good as the 16-processor Cray. Thanks to the Alpha processor, we have thus been able to show that it is possible to factorize 155-digit numbers without using expensive vector computers.
The program produced the result on October 4, at 5:05am, yielding 11 true dependencies. We then only had to complete the last step, which is using the dependencies to get the factor. This involves taking square roots of huge ideals in Qn, and took 11 hours. Each dependency gives the desired result with 50% probability, so we started five simultaneous runs of the program in order to have a reasonable probability of success.
One of the runs gave us the factorization of the RSA modulus:
= p78 x 12844205165381031491662259028977553198964984323915864368216177647043137765477