1. 06 Mar, 2011 1 commit
    • Moe Jette's avatar
      select/cray: fix error in 'is_gemini' logic · 6c927b3f
      Moe Jette authored
      The is_gemini logic is too simple: as just observed on a SeaStar system, it can
      be fooled into the wrong result if more than 1 row has NULL coordinates. 
      
      This case happens if a blade has been powered down completely, so that the SeaStar
      network chip is also powered off. The routing system recognizes this case and 
      routes around the powered-down node in the torus. It is plausible that in such a
      case the torus coordinates are NULL, since the node(s) are no longer part of the
      torus. 
      
      (It is also possible to set all nodes on a blade down, but leave power switched
       on. The SeaStar chip, which is independent of the motherboard, will continue to
       provide routing connectivity, i.e. the torus coordinates would all be non-NULL,
       but no computing can be done by the node, the ALPS state is "ROUTING".)
      
      Here is the example which revealed this behaviour: one blade, nodes 804-807,
      had been powered down after system failure.
      
      mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor;
      +----------+-----------------------------------------+
      | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) |
      +----------+-----------------------------------------+
      |     1882 |                                    1878 | 
      +----------+-----------------------------------------+
      
      ==> There are 4 more node IDs than there are distinct coordinates.
      
      mysql> select processor_id,x_coord,y_coord,z_coord from processor\
             WHERE x_coord IS NULL OR y_coord IS NULL OR z_coord IS NULL;
      +--------------+---------+---------+---------+
      | processor_id | x_coord | y_coord | z_coord |
      +--------------+---------+---------+---------+
      |          804 |    NULL |    NULL |    NULL | 
      |          805 |    NULL |    NULL |    NULL | 
      |          806 |    NULL |    NULL |    NULL | 
      |          807 |    NULL |    NULL |    NULL | 
      +--------------+---------+---------+---------+
      
      ==> The corrected query now also gives the correct result (equality):
      mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor\
             WHERE x_coord IS NOT NULL AND y_coord IS NOT NULL AND z_coord IS NOT NULL;
      +----------+-----------------------------------------+
      | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) |
      +----------+-----------------------------------------+
      |     1878 |                                    1878 | 
      +----------+-----------------------------------------+
      6c927b3f
  2. 04 Mar, 2011 23 commits
  3. 03 Mar, 2011 16 commits