Associate Professor, Dept. of Mathematical Sciences
Senior Fellow, San Diego Supercomputer Center
San Diego State University
San Diego, CA 92182-0314
Dec. 2, 1992
SUE was funded primarily by the Division of Advanced Scientific Computing at the National Science Foundation. Additional funding was provided by The Cray Research Foundation.
A Background on the grant
The San Diego Supercomputer Center (SDSC) has taken a proactive role in disseminating information to make supercomputing accessible to a much wider audience. In particular, SDSC has targeted instructors at undergraduate institutions to introduce their students to supercomputers and their use. As part of this effort, SDSC established Supercomputing and Undergraduate Education (SUE). This program enhances the supercomputing expertise of faculty and helps them incorporate supercomputing topics into their curricula and departmental majors.
A major component of this program is a one-week residential summer workshop at SDSC for faculty from primarily undergraduate institutions throughout the U.S. Lecture materials from both the 1991 and 1992 workshops can be obtained via anonymous FTP over the Internet (See handout for details).
Another component of this program is an annual course taught at San Diego State University (SDSU), in which undergraduate students learn about supercomputers and their use. This course is described in the SDSU catalog as follows:
CS 575 - Supercomputing for the Sciences
Interdisciplinary course intended for all science and engineering majors.
o Advanced computing techniques developed for supercomputers
o Overview of architecture, software tools, scientific
computing and communications
o Hands-on experience using supercomputers
Prereq.: Extensive programming background in Fortran or C.
This course has been taught twice. B Major Themes of the Program
SUE's 1991 and 1992 faculty workshops at SDSC and the undergraduate curricula at SDSU focused on the following topics in supercomputing.
1 Interdependence of Computer Science & Scientific Experts
The workshop faculty represented two groups: those interested in learning discipline-specific applications packages and those interested in using software tools to facilitate programming. To accommodate both groups, we presented an overview of available resources (applications packages and program optimization techniques) and encouraged faculty to seek further information independently. We identified the technical people they could contact for further information on their particular interests.
Similarly, the required programming background for the course reinforced the traditional separation between computer science and science/engineering students by encouraging the former to take the course. In both cases, the participants interacted well as they realized the benefits of working with others with different strengths.
2 CRAY architecture
The course used the text Computer Architecture: A Quantitative Approach, by Hennessy and Patterson (published by Morgan Kaufmann). The sections we used from this text proved invaluable.
We tried to gain an appreciation for the sources of the Cray Y-MP's power and understand the subtleties of its design as they impact the way a programmer should approach a problem. Therefore, we carefully avoided many of the sections of the text that did not directly relate to the Cray Y-MP. (The course notes available via anonymous FTP can provide a guide for other instructors interested in using this text for a similar purpose.) This text was supplemented with documents by Cray Research, Inc.; particularly the following titles:
TR-OPT (Rev. D) cf77 & scc Features and Optimization. An excellent, Cray-specific training report covering the Fortran (cf77) and C (scc) compilers. Provides code examples, diagrams, and explanations of crucial vectorization topics (and the conditions that inhibit the compiler), memory organization, performance tools, common optimization techniques, and much more.
TR-YSAAP Cray Y-MP System Architecture for Applications Programmers. Covers material at the assembler level in more detail. Works well with the models developed in Hennessy and Patterson (unfortunately this document is no longer available)
3 Architecture of parallel supercomputers (Intel iPSC/860 and nCUBE 2)
This was not covered in the undergraduate course at SDSU, but was covered at the faculty workshop by consultants from Intel and nCUBE, who gave introductory lectures on the parallel architectures.
Both the workshop and the course covered the resources
available through the Internet, including:
a. Accessing sources of information
(news groups, anonymous FTP):
nnsc.nsf.net (NSF information site)
oak.oakland.edu (simtel20 mirror)
(Dr. David Kahaner's Japan Bulletins)
b. Communicating with peers (via e-mail)
c. Using FTP to transfer programs between machines (a
crude look at heterogeneous computing)
At many universities, accounts issued to students in computer courses are limited only by the amount of available file space. Therefore, the concept of monitoring CPU usage can be new. The students in the SDSU class were given programming projects on a mainframe at SDSU and were introduced to the concept of accounting using crude UNIX timing tools (e.g., dtime) before they were moved to the Cray.
When they ran programs on the Cray, students and faculty had a fixed amount of CPU time to work with. Students were reminded that they must complete their course projects without exceeding their CPU allocations. The more sophisticated timing and resource monitoring tools were used to ensure that the student projects did not use up more time than they were allocated for the semester.
6 Computer ethics/responsibility
Students in the course chose between two computer-related scenarios and wrote a one-page essay on a scenario, discussing the behaviors of the individuals involved. The essays were graded on a pass/fail basis (only essays that showed a lack of thought or effort "failed"). The goal, for the instructor, was to gain a better understanding of the students' attitudes concerning computer-related issues.
Dr. Dan Sulzbach, Executive Director of SDSC, also gave a week of lectures on computer ethics, which stimulated some very interesting class discussions.
Dr. Sulzbach began his talk by stating:
"I am not a philosopher, not an ethicist, not an expert. I am a computer professional like you. I'm not here to preach because I have no license or authority to do that. I hope only to raise some issues related to computer ethics. I will undoubtedly ask more questions than I answer."
Computer accounts on the Cray Y-MP were distributed only after the week-long discussions on ethics.
Most of the students had no problem with the computer ethics essay, but they typically had little experience in writing up a science-oriented programming report. Therefore, we assigned a programming project that consisted of the following components:
a. Solve a problem on a campus mainframe and document the mainframe's performance.
b. Solve the same problem on the Cray.
c. Compute an enhancement of the problem on the Cray and document the Cray's performance.
Instructor feedback after reading the document from assignment (a) greatly enhanced the organization and content of the final document in (c).
See Section G SDSU course programming assignments in this handout for more details on the exact assignments. Section G also gives the instructor's clarifications and directions to students to help them specify and document their projects.
C SUE Workshop Overview
This one-week workshop overviewed the many resources available at the San Diego Supercomputer Center. In general, he mornings involved presentations by consultants from the following institutions:
San Diego Supercomputer Center (SDSC)
Cray Research, Incorporated (CRI)
Intel Supercomputer Systems
In the afternoons, software demonstrations or open labs were organized. Due to the diversity of backgrounds of the workshop participants, we covered a very wide variety of information at a basic level. Participants were encouraged to discuss the information in more detail with the consultants in the afternoon laboratory sessions. We also provided information on resources available over the Internet to enhance curriculum development in supercomputing.
Technical support was provided by
Cray Research, Incorporated
D SUE Workshop Schedule MONDAY
What is a Supercomputer? (Dan Sulzbach)
Business (Kris Stewart)
How to login, how to print
DataTree file storage
Effective use of resources/accounting
(batch queues) at SDSC
SDSC Cray User Guide
CS 575: Supercomputing for the Sciences (Kris Stewart)
Hennessy & Patterson text: Computer Architecture: A Quantitative Approach (Morgan Kaufmann)
Responsibility and ethics
Robbins & Robbins, Cray X-MP/Model 24 (Springer-Verlag)
Cray TR-OPT examples
Dr. Lloyd Fosdick's HPSC Overview
pm Afternoon lab session
Run Fosdick-HPSC ch. 7 examples.
Run TR-OPT examples and use performance tools
Run your own "student project" codes
Etan Scherzer (CRI instructor) covers TR-OPT
TR-OPT is an extensive Cray software training workbook which is typically covered at a more leisurely pace over a one-week period. Etan will try to highlight crucial portions and will then be available for individual discussions this afternoon and tomorrow.
Etan will be available all afternoon to answer any Cray-specific questions.
am Cray Application Packages (SDSC Consultants)
Biology (Jack Rogers)
Chemistry (Jerry Greenberg)
CFD (Rich Charles)
Math (Bob Leary)
pm "Teaching Chemistry" (Rozeanne Steckler)
"Teaching Advanced Graphics" (Michael Bailey)
VisLab reserved: (AVS, Insight)
Intro to VisLab and Workstations
am Access to Parallel Machines at SDSC
nCUBE Introduction and Examples (Chuck Niggley,
Intel iPSC/860 (Dancil Strickland, Regional Parallel
Systems Engineer, Intel Corp.)
pm Scalable Version of Wave Equation (Carl Scarbnick, SDSC)
Work through examples on the parallel machines.
nCUBE and Intel consultants will be available for
What do you see as the future of high-performance
Parallel vs. vector
HPSC Curriculum, Dr. Lloyd Fosdick, U. Colorado,
Computational Science, Dr. Geoffrey Fox, Syracuse
Parallel Computing, Dr. Chris Nevison, Colgate
Different Curricula Orientations
B.S. in Computational Chemistry or Computational Science?
E SUE workshop materials via anonymous FTP
Most of the files from the SUE workshop can be accessed via anonymous ftp. To access them, FTP to the host rohan.sdsu.edu then retrieve from the directories /pub/sdscinfo/SUE-notes, /pub/sdscinfo/SDSC-info-files or /pub/sdscinfo/Supercomputing-Course-Notes.
A short description of the individual files is contained at the end of this handout.
F Lecture materials and readings
The sections from the following chapters of Computer Architecture: A Quantitative Approach (Hennessy & Patterson) were covered in the first nine weeks of lectures in the course. See the lecture notes available via anonymous FTP for more details on the sections covered.
Chap. 1 Fundamentals of Computer Design
Chap. 2 Performance and Cost
Chap. 3 Instruction Set Design: Alternatives and Principles
Chap. 4 Instruction Set Examples and Measurements of Use
Chap. 5 Basic Processor Implementation Techniques
Chap. 6 Pipelining
Chap. 7 Vector Processors
Chap. 8 Memory-Hierarchy
The goal was to understand the following advanced topics:
Pipelined (segmented) functional units
Chaining of functional units
Memory bank conflicts
G SDSU course programming assignments
As the lecture material was covered, students worked on their first two programming assignments. The goal was to develop a feeling for timings on the SDSU mainframe compared to accuracy of approximation schemes. Students were asked to assess how much it costs (measured in CPU time for now) to get a good answer (measured by true error). Since most students had little numerical analysis background, Dr. Stewart provided the original code for the "First Program" problem, and the students were instructed to insert the appropriate timing calls.
First Program 1991 Course
Run and time a Fortran code that solves a two-point boundary value problem
The dimension of the approximation should be varied and you should time the separate pieces of the solution process. You should document the performance of the SDSU mainframe on this problem and discuss the sensitivity in accuracy and timing
NOTE: This proved to be a somewhat confusing problem for students new to numerical analysis. There are too many sources of error. As the finite difference grid is refined, the approximation is more accurate. But the linear system that is solved becomes more poorly conditioned, thereby introducing errors.
First Program 1992 Course
Consider the linear system A x = b, where the N by N matrix A is given by
aij = 1/(i+j-1) (the notorious Hilbert matrix)
The right-hand side, b, will be chosen so that the true solution, x, will be all 1s. Therefore,
bj = S aij
You should solve this linear system for various values of N and observe the error incurred (by computing true error, since we know the true solution should have X = 1s) and the performance (measured by the elapsed CPU time).
Science-Oriented Program to be run on SDSU Mainframe and subsequently on the CRAY
The main project was crucial to this course. Students were going to use the Cray to run this project, and the instructor did not want them to squander Cray resources before they became familiar with their problems. Students were allowed to pick from a selection of problems provided by the instructor, or they could solve a scientific problem from their particular backgrounds or work environments.
Good sources for problem statements were:
"Computing Applications to Differential Equations: Modelling in the Physical and Social Sciences," by J.M.A. Danby; Reston Publishing
"Numerical Methods and Software" by Kahaner, Moler and Nash; Prentice-Hall Publishers.
"Computational Physics" by Koonin and Meredith; Addison Wesley Publishing Company.
The MAIN PROJECT assignment:
a) Get your "science" program running on the SDSU mainframe.
b) Write a report describing your problem and your program's
performance on the SDSU mainframe.
c) Get your program running on the Cray.
d) Extend your problem. For example, use a finer grid spacing
or use more species in an interaction. (This will depend
on your particular problem.)
e) Submit a final report on your Cray project.
Topics to be covered in your report:
a) Your write-up should have a self-contained statement of the problem. The reader should not have to read your code to find out what equations you are working with, or what the specific problem is that you are solving.
b) Give a complete reference to where the problem came from.
c) Define your measure of work so that comparisons can be made when you run on the Cray. You can't talk about "faster" or "better" without a specific measure of performance.
d) Discuss what conclusions are drawn from the problem itself. What is the "science" story revealed by the original problem? Why was this problem solved.
e) You should carefully organize the results. A summary of pertinent results for both the "science" of the problem and the "performance" of the program should be presented. Optionally, include an appendix for more detailed results.
This midterm focused on compiler terminology and related concepts from the Hennessy/Patterson text with the Cray document TR-OPT, including the following :
Jamming Vectorizing Loops
Separating Loops into Vectorizable and Nonvectorizable
Linearizing Nested Loops
Unrolling Loops (vertically and horizontally)
Midterm Exam from the 1991 Course
The midterm from the 1991 course asked students to write and time DLXV assembly code (developed in great detail in the Hennessy and Patterson text with numerous examples) for the translation of Fortran to perform the matrix/vector multiply, Ax = b, in two different manners.
Column-oriented manner more suitable for vector processors:
b A + ... + b A = x
1 *1 N *N
This is a demanding problem, but most students gain a deep understanding of the Cray's vector processor structure.
I. Other Educational Programs at SDSC
HPCC and K-8 Education
Jayne Keller, SDSC Education Coordinator
Integrating high-performance computing and communications (HPCC) into the curriculum of primary and secondary schools is critical to the development of the technicians, scientists, and engineers of the future. SDSC offers the following activities to address this need: Supercomputer Center field trip, HPCC half-day in-service workshop, the SDSC road show, and a technology checklist.
Rozeanne Steckler, SDSC Manager, Applications R & D
SDSU Adjunct Professor, Chemistry
The SDSU course Chemistry 596: Chemistry on Supercomputers is designed as an overview of modern computational chemistry with an emphasis on learning to use the major chemistry software packages. This course is not designed as an introduction to theoretical chemistry, but rather a course to introduce experimental chemists to the computational tools available and how to use them in an informed manner. Many aspects of computational chemistry will be introduced with each topic presented in coordinated lectures and labs.
Michael J. Bailey, SDSC Manager of Scientific Visualization
The UCSD course AMES 293 Advanced Computer Graphics for Engineers and Scientists is targeted towards students in engineering or science majors who are interested in applying advanced visualization techniques to solving scientific problems. It is not oriented towards any one major in particular, but is instead directed towards science in general. Students in this course will learn techniques that will allow them to develop and use scientific graphics programs effectively.
Research Experience for Undergraduates at the San Diego Supercomputer Center
Hassan Aref, SDSC Chief Scientist, and
Rozeanne Steckler, SDSC Manager, Applications R & D
SDSU Adjunct Professor, Chemistry
Students work on research projects in fields of interest within the disciplines that make up computational science. Supervisors are faculty at the student's home institution and SDSC staff. Included are workshops on high-performance computing and special lectures on such topics as parallel computing, graphics and scientific visualization, and numerical analysis. Some students, already engaged in computational research with a faculty member, select a topic within that project. Others, with an interest in a certain area, use the REU program to take the first steps. The objective is to give each student a taste of research in computational science, albeit within a condensed time frame. All students are given access to appropriate computer resources at SDSC.
Susan Estrada, Executive Director
The California Education and Research Federation Network (CERFnet) provides a connection to the world via Internet giving access to hundreds of databases and over a million users worldwide. Its goal is to promote collaboration among scientists, engineers, and educators in commercial, government, and academic sectors.
CERFnet provides a 24-hour hotline, continuous network monitoring and management, an expert staff, and maintenance support. Begun with the support from the National Science Foundation, CERFnet is a project of General Atomics, a San Diego-based research and development company.
Reuben H. Fleet Space Theater & Science Center: Project Oasis
Joseph Deken, Senior Fellow, SDSC
Senior Scientist, RHF
The Reuben H. Fleet Space Theater and Science Center is one of the most highly respected informal science education centers in the nation. Located in Balboa Park's cultural complex, it houses the world's first OMNIMAX theater, more than 60 "hands-on" science exhibits which encourage visitor participation, as well as a multimedia planetarium show.
In July of 1992, SDSC and the Reuben Fleet Center launched a formal collaboration called Project Oasis. The focus of this collaboration is twofold:
1 To develop interactive exhibits and educational programs about high-performance computing and communications for the general public.
2 To develop new technology for interactive exhibits and educational programs using leading edge computing and communications systems, especially computer networking and visualization.
As part of Project Oasis activities, two SDSC consultants gave lectures at the Reuben H. Fleet Space Theater & Science Center recently:
"Computer Visualization I: The Solar System"
by Dave Nadeau, Visualization Specialist at SDSC
"Computer Visualization II: The Antarctic Seafloor"
by Jim McLeod, Visualization Specialist at SDSC
Overview of SUE
This file (readme.SUE) presents a description of the files distributed at the 1991 Supercomputer and Undergraduate Education (SUE) Workshop. These files are in /pub/sdscinfo/SUE-notes anonymous FTP from rohan.sdsu.edu
Access and use of documents statement (accesuse.asc)
Schedule of events for 1992 SUE workshop (agenda.92)
Brief overview of SDSC accounting and SUE resources
(accounti.sds) Obtaining Cray documents (cost) (cray-man.cst)
Overview of Dr. Lloyd Fosdick's HPSC program from U. Colorado Boulder. This is a terrific program in undergraduate High Performance Scientific Computing. (hpsc-ks.rme)
Intuitive introduction to ODEs, Euler's method, SAXPY and their
connection with vector operations. This fits well with Chapter 7 from Hennessy and Patterson. (my-chap7.asc)
Kay A. Robbins and Steven Robbins wrote a book that serves as a good teaching tool for understanding the details of the Cray at the assembler level. The Cray X-MP/Model 24: A Case Study in Pipelined Architecture and Vector Processing was published by Springer-Verlag in 1987. (xmpsim1.asc)
1. Organization of CS 575 Supercomputing for the Sciences, taught at San Diego State University, Spring 1991 and 1992, by Kris Stewart
a. Outline of course (575ovrvw.s92)
b. Initial handout to students (575init.asc)
c. List of student programming projects - most final reports and source code are available as tar files (575proj.asc)
d. A road-map through the lecture notes and how they were used in the CS 575 course (readme.575)
2. Actual course notes for CS 575 based on the text, Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson, Morgan Kaufmann Publisher, 1990, coupled with handouts from Cray Research Incorporated from two documents, TR-OPT and TR-YSAAP. The files associated with the Patterson and Hennessy text are named ph-something.asc, those associated with Cray documents are name cray-something.asc. There are many files - see readme.575 in the anonymous ftp directory /pub/sdscinfo/Supercomputing-Lecture-Notes
Of particular interest are the files tr-opt3.asc and tr-opt7.asc
Chapter 3 of the CRI document TR-OPT presents an overview of vectorization terminology and examples. This includes a Fortran code which is useful for showing different types of loops and how the compiler identifies them in the output listing. This code should only be compiled not executed since the arrays involved are never initialized. Chpater 7 of TR-OPT presents the fundamental optimization techniques. The file tr-opt7.asc contains Fortran code that should be executed since timing statistics are collected to demonstrate the effects of a programmer's source code on the Cray's performance and the ability of the Fortran compiler to automatically optimize source code.
3. Computer Ethics and Responsibility Section. It was felt that before students were given access to the Cray Y-MP it was essential to have an explicit discussion of "ethics" and "responsibility" coupled with a written essay assignment.
a. Computer ethics assignment (four scenarios)
b. Handout of the lecture given by Dan Sulzbach (ethic-l2.asc)
4. Using the Cray - note students will have spent 6 weeks programming on the SDSU mainframe in Unix prior to moving to the Cray. (readme.3rd)
a. Initial handout to students on Cray use (crayinit.s92)
This handout also discusses the files:
crayfopt.asc (examples of Fortran optimization)
crayc-ex.asc (samples of c codes and techniques)
my-optim.tar (sample tar file for students to
to use to become familiar with tar)
b. Man pages for Cray Fortran (cf77, fpp, fmp, cft77) compiling environment (crayacce.asc)
c. Location of sample codes and how to create sample run of Fortran to get listing, marked loops and diagnostics
(cf77 -ZV -Wf"-emx") (crayacce.asc)
d. Man pages for cc and cl for C compiler with listing
e. As in c) above for C to get listing (cl) and diagnostics
(cc -h report=vsi) (crayacce.asc)
I recommend that instructors take extra time to explain reslist (a relatively expensive command which gives students information on their remaining resources), ja (an inexpensive Unix system call with various parameters) and the NQS batch system. You are charged double for all interactive jobs on the Cray at SDSC. Students usually are not familiar with using Batch Queues, which can reduce the charges on a job to 0.5 times actual use (therefore a four-fold decrease over interactive user). Students need to become experienced with these queues to effectively use their finite amount of Cray time.
SDSC Documents (note new users of the SDSC Cray will receive the SDSC User Guide. You can obtain additional copies of the User Guide through the doc processor on the Y-MP. Type doc. The file you want is usrguide. This is a very long file, so I would not recommend getting the whole thing. The individual chapters of the User Guide are available as separate files, e.g. ugoptim, ugtools, ugunicos, etc). Other files available from the doc processor (and in the directory /pub/sdscinfo/SDSC-info-files anonymous FTP from rohan.sdsu.edu):
f. Introduction to UNICOS (unicos)
g. EZFortran, EZC, EZDebug (ezfortrn, ezc, ezdebug)
h. EZStorage (DTI, data tree documentation) (ezstorag)
This file (readme.2nd) presents a detailed overview of the actual lectures of the CS 575 course. These files are in /pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from rohan.sdsu.edu
The files fall into three classiciations:
a) course information and additional examples from the instructor
b) those related directly to the Hennessy & Patterson text describing which sections/topics/concepts were used from each chapter
c) xeroxed copied of pages from the Cray documents TR-OPT
Info and Examples from instructor
575init.asc Initial handout given to students the first day of
charac.asc Handout given to students the second week as we
discussed "What is a Supercomputer?". These were
notes taken from Dr. Dan Sulzbach talk at the SDSC
Summer Institute in 1990.
assignmt.asc Describes the programming assignments students were
asked to complete during the semester.
my-chap7.asc I wrote this section to try to motivate the idea of
vector registers and operations from the point of view
of science. A major computation performed repeatedly
in scientific computation involves solving ordinary
differential equations (ODE). Presents the idea of an
ODE as a vector system of equations and shows how
Euler's method can be visualized as doing simple
vector operations, a saxpy with scalar h and vectors
y/current, f/current and y/next. Although students do
not have a deep background in numerical analysis,
this has been successful in relating the saxpy to
scientific computation at an intuitive level.
optimiz.doc This is an SDSC document available via the doc processor on the Cray Y-MP. This was FTPed from
the Cray to the SDSU mainframe and students students
were encouraged to obtain their own copy.
crayfopt.asc I coded up the examples from the SDSC Optimization
document. These code are presented in the appendix
of that document and available on the Cray. This
handout has details on accessing the codes, untarring
the codes, running the make utility to compile with
various Fortran optimization flags on or off
crayc-ex.asc Handout on C codes and how to rewrite them to improve
performance. These are concepts discussed in TR-OPT which I coded up to give students examples of C codes
and the Cray tools to analyze their performance. These
were provided by Etan Scherzer, CRI.
xmpsim.asc The text The Cray X-MP/Model 24, A Case Study in Pipelined Architecture and Vector Processing by
Kay A. Robbins and Steven Robbins (Springer-Verlag) is
an excellent source for explanations and examples of
the performance of the Cray at the assembler level.
Details on sections used from Hennessy & Patterson
ph-intro.asc Introductory discussion of the aims and orientation of the course and the use of the text Computer Architecture: A Quantitative Approach by Hennessy and Patterson (Morgan-Kaufmann, Pub., San Mateo, CA)
ph-chap1.asc This chapter establishes a definition of performance, presents Amdahl's Law, and defines and uses terms such as latency and throughput. I added Gantt charts to illustrate the assembly line example to agree with later discussions of pipelined CPUs.
(the handout cray-arc.asc from TR-OPT fits in well here also)
ph-chap2.asc We are only interested in performance in this chapter. The treatment of cost is oriented toward someone designing a computer architecture. We are CONSUMERS of an architecture, not its designer in this course (of course it's a very complex architecture we are studying). This chapter discusses MIPS, MFLOPS and their limitations as measures of performance.
ph-chap3.asc Really only interested in Section 3.7 - The Role of High-Level Languages and Compilers. The Cray Y-MP is a register-register machine. There is a nice example using a graph coloring algorithm for register allocation. Students should try to develop an intuitive idea of what the compiler really does for them. The compiler is the major software tool that aids in effective use of the supercomputer. (the handout cray-com.asc fits well here)
ph-chap4.asc This chapter presents and discusses instruction sets for the VAX, IBM 360/370, Intel 8086 and DLX. DLX is the architecture the book develops for all its examples and is our only interested in this chapter. Section 4.5 present the DLX instruction set and gives examples of its use. In Chapter 6, pipelining will be presented using this instruction set. This is an important concept for understanding the performance of the Cray. In Chapter 7, this instruction set is extended to DLXV to implement vector processing (and descriptions of chaining, vector stride, strip mining vector loops, and more). So it is essential to work through lots of examples using DLX so that students are comfortable with the material in Chapter 6 and the extensions in Chapter 7. (add cray-scl.asc here)
ph-chap5.asc This chapter establishes the basic steps of execution: instruction fetch, instruction decode and register fetch, execution of effective address, memory access and branch completion, write-back step. Students probably don't realize that at the machine (or assembler) level there are many tasks to be performed to accomplish something as simple as
This will be important when Chapter 6 discusses pipelining. (add the handout cray-con.asc here)
ph-over6.asc Chapters 6 and 7 are the main goal of the course lecture material. The concepts of:
pipeline operation of segmented scalar computational
segmented vector functional units and chained pipelines
vector stride, memory bank conflicts and stalls in the
strip mining vector loops
are covered in these two chapters. Our recurring example will be the saxpy. We first examine Chapter 6 and its sections:
6.1 What is pipelining?
6.2 The basic pipeline for DLX
6.3 Making the pipeline work
6.4 The major hurdle of pipelinig - hazards
Introduces forwarding which is crutial for
understanding chaining of functional units
6.6 Extending the DLX pipeline to handle multicycle
6.8 Advanced pipeling - taking advantage of more
6.12 Historical perspective and references
Important homework problems: 6-11 to 6-19 on saxpy
ph-over7.asc This chapter presents the DLVX instruction set that is used in the examples and homework problems (add handouts cray-ch3.asc and tropt-3.asc here).
phover7b.asc This covers section 7.6 of the text - Enhancing Vector Performance. This discusses the VERY important concept of chaining. Important homework problems: 7-1 to 7-6, 7-8 to 7-10. Most of these were worked in lecture so students would get lots of practice with timings. (add handout cray-ch7.asc and tropt-7.asc here)
phover7c.asc The real guts of the classes - end of Chapter 7 of Hennessy and Patterson
7.7 Putting it all together: Evaluating the performance of vector processors
7.8 Fallacies and pitfalls - interesting reading
7.9 Concluding remarks - interesting reading
7.10 Historical perspective and references. VERY interesting reading
Pages from Cray documents TR-OPT and TR-YSAAP
cray-arc.asc These pages included a diagram of the hardware components of the Cray Y-MP, diagram of the 8 CPUs and their memory and communcations sections, a table of the 14 separate functional units with the registers they use and time (in clock periods) and a block diagram of a single CPU.
cray-com.asc These pages included a quick look at the Fortran Compiling System (cf77) and the Standard C system. This presented some standard ideas for scalar optimization. These are operations performed by the compilers students are accustomed to using on the scalar machines at SDSU, e.g. expression reordering, constant folding, common subexpression elimination). This handout also discusses the phases of the compilation process: source statement processing, scalar optimization, vectorization, code generation.
cray-scl.asc These pages covered another look at the registers, functional units and memory access paths for the Cray Y-MP.
cray-con.asc These pages cover the control section of the CPU.
cray-ch3.asc Students were given copies of Chapter 3 of TR-OPT. This chapter presents a view of Vectorization on the Cray Y-MP. It has many examples and I found it very readable.
tropt-3.asc I coded up a Fortran code from the examples in Chapter 3 and we went over this in class. I also showed students the tropt3.m file which was produced by cf77's Fortran vectorization preprocessor (fpp). This gets students accustomed to the powerful tools provided on the Cray to aid in optimizing codes.
cray-ch7.asc Students were given copies of Chapter 7 of TR-OPT. This chapter presents a vector of Common CPU optimization techniques.
tropt-7.asc I coded up Fortran code from the examples in Chapter 7 and ran before and after timings for the standard (original) coding and the optimized (modified) coding. Also there were two setting used in compiling the code - one with all optimization turned off and one with standard optimization. So there are four sets of execution times to be examined. The base case is original/no-optimization, but students can gain a feeling for how smart the cf77 compiling environment is by comparing the execution times for original with optimization-on. The source code (tropt7.f) and tropt7.m which is the translated code from fpp are both included.
Actually running on the Cray
This file (readme.3rd) provides detail pertinent to actually running codes on the Cray Y-MP at SDSC. These files are in /pub/sdscinfo/Supercomputing-Course-Notes anonymous FTP from rohan.sdsu.edu
Students received their own copy of SDSC User's Guide. This is an excellent document written by the SDSC consultants giving a thorough introduction to the Cray, its programming tools, applications packages and more. Given human nature, the User's Guide's thoroughness tends to dissuade users from just sitting down and reading it cover to cover. I therefore e-mailed students a copy of
sdsc-gid.asc A Road Map for the SDSC User's Guide highlighting portions students should pay particular attention to.
crayacce.asc This gives information which instructors could obtain and then make available to their students on their home machines. Includes especially useful man pages, the location on the Cray Y-MP of some sample Fortran and C codes used in TR-OPT. Also gives my recommended calling options for cf77 and cc/cl for generating informative examples and listings.
Students were given the location of crayfopt.asc and cray-exc.asc on our home machine since they are very long. They contain detailed information that students could examine to familiarize themselves with the SDSC Cray tools without using up their own CPU allocation.
Since students will have a finite amount of CPU time on the Cray, I tried to provide a lot of information that most users would obtain in initial explorations on a new machine. I wanted to avoid having 30 separate students repeat the same explorations. Therefore, students were e-mailed a copy of
initcray.asc a quick update on accessing the Cray from SDSU. Highlights the Unix processors on the Cray and other SDSC specific things - like DTI - that Cray users need to know about.
I think most people will find the initcray.asc file informative, though parts of it are specific to accessing the SDSC Cray from facilities at SDSU. It has hints on Cray usage. It presents a log of an actual session, so that you can see how the logon process proceeds. The Cray news file is read. The doc processor is used to find the file optimiz which is a very informative SDSC Fortran document). This file is copied to a local file (since we see that it is approximately 100 pages long) so that you can edit it, or download it to another machine, or whatever.
initcray.asc has names for the off-site printers (at SDSU) that you can have your output sent to. If you generate output on the Cray and do not specify that it be routed to a machine to your particular site, then the output will be sent to you via U.S. mail. This will take a couple of days.