MeerKAT Control and Monitoring (CAM)
Transcription
MeerKAT Control and Monitoring (CAM)
MeerKAT Control and Monitoring (CAM) Lize van den Heever Paul Swart CAM Subsystem Manager [email protected] Senior Software Engineer [email protected] MeerKAT Phases/Specifications v 64 x 13.5m offset Gregorian dishes o 1mm rms surface o 15 arcsec pointing accuracy (with approx 5 arcsec tracking consistency) v Frequency range 0.59 – 14.5 GHz v 65k freq channels (spread over 4 sub-bands) v L-band sensitivity: Ae/Tsys = 220 m^2/K Phase 1 Phase 2 Est. completion 2016 2018 Frequency bands (GHz) 1.0 - 1.7 0.59 – 1.1 8 -14.5 RF bandwidth (MHz) 850 6500 Sampling frequency (Gsps) 5 30 Processed bandwidth (MHz) 850 6500 Max baseline (km) 8 8 2 Offset Gregorian Dish – prelim design 3 MeerKAT Project Status u 3 centres: v JHB (some “business” functions, infrastructure and site bid) v Cape Town (operations control centre, engineering and science) v Karoo (telescope site) u About 100 people employed directly on the project currently (growing) u MeerKAT (and SKA SA) site operational after major infrastructure development u Site on grid power (with diesel backup) with 10Gb fibre connection to Cape Town u KAT-7 engineering and science test-bed fully deployed on site (7 prime focus composite dishes) u KAT-7 turn-o program of commissioning operations u Continued strong political support u Good momentum! 4 KAT-7 5 KAT-7 – HartRAO Baseline KAT-7 7 KAT-7 our playground MeerKAT Political Support Weekly flight to Karoo site Cape Town office Karoo infrastructure KAT-7 Composite Dishes Feeds, Receivers & Electronics Cold Feed Installation Digital Signal Processing KAT-7 Correlator (16-element) KAT-7 19 KAT-7 Results 20 KAT-7 Early Fringes (2009) KAT-7 Cen A (2010) Moon size 4-dishes, warm feeds 22 KAT-7 Cen A (2011) 7 cold23feeds KAT-7 PKS1610-60.5 (2011) 24 MeerKAT 25 MeerKAT Schedule u MeerKAT System PDR v Very successful PDR completed in July 2011 v Strong international panel approval for MeerKAT system design u MeerKAT CAM Requirements memo and interface guideline v First draft available - was supporting documentation for MeerKAT System PDR v Guidelines for communication with Ethernet Devices (katcp protocol) - MeerKAT version will be ready by Dec 2011 u MeerKAT CAM Architecture description v To be ready for external review towards 2012 v Including updated requirements and signed off interfaces u MeerKAT Critical and Major milestones v MeerKAT Science RFP Selection v MeerKAT System Concept Design Review (CoDR) v MeerKAT System PDR v MeerKAT Receptor 1 Qualification complete v Array release 1: Antenna 2 - 5 Array Commissioning v Array release 2: Antenna 6- 32 Start of early science v Array release 3: Antenna 33-64 Full MeerKAT Array v MeerKAT (Phase 1) Handover to Science Operations Mar 2010 * Jul 2010 * July 2011 * Jun 2014 Dec 2014 Dec 2015 Dec 2016 Jul 2017 26 MeerKAT CAM Scope MeerKAT CAM Overview 28 MeerKAT CAM components u Configuration and management components v kat conf, kat controller, kat nodemanager u Communications framework - Device proxies & device controllers v Core CAM access layer for all hardware and devices on standard protocol (katcp) v Protect hardware from direct access and expose device monitoring points, commands and logs u Monitoring components v kat store, kat monitors, kat aware, kat logger u Control components v kat scheduler, kat executor, kat subarray manager, kat controller u User Interface components (on site and in Cape Town) v kat core & ui libraries, kat portals, user interfaces, archive access tools u Planning Tools v Proposal Management Tool and Observation Planning Tool 29 MeerKAT CAM Numbers u KAT-7 monitoring (sensors only, excluding logs and alarms) v 2 CAM servers with 10-20 processes each v ± 260 sensors per antenna (x 7 antennas) v ± 4500 total sensors v ± 300 sensors sampled at order ms rate v ± 100 sensors sampled at order second rate v Rest sampled with default rate of 10s or event (depending on sensor type) but can be configured from ms up to minutes v ± 650 samples per second v ± 16GB per hour (compressed) initial storage, decimation over time being implemented u MeerKAT monitoring (estimates) v 6 CAM servers with 10-30 processes each v ± 500 per antenna (x 64 antennas) v ± 100 000 total sensors v ± 2500 sensors sampled at order ms rate v ± 100 sensors sampled at order second rate v Rest sampled with default rate of 10s and event but can be configured from ms up to minutes v ± 12000 samples per second v ± ¾ - 1 TB per hour (compressed) initial storage, decimated over time 30 MeerKAT CAM design principles u Some of the core design principles and implementation decisions: v Use of TCP/IP over Ethernet as a field-bus as far as possible v Standardized communications everywhere! - standardizing device/component interfacing over a well-defined protocol - standardizing device/component behaviour - heterogeneous and diversity – handle this as low as possible v Soft real-time as far as possible - no time critical control loops in CAM software - real-time control decentralized down to devices v Incremental development and continuous deployment v Verify technology decisions through pilot projects and prototyping 31 MeerKAT CAM concepts/lessons u Standardized communications protocol (katcp) is core!!! - katcp-python publicly released on pypi & used by others like CASPER collaboration - katcp controllers delivered by subcontractors or developed in-house u Standardizes - startup behaviour and handshaking - version, build state and serial nr reporting - fault codes and failure reporting - types, status’ and reporting behaviour of monitoring points - types of commands and exception behaviour of commands - standardize logging platform and behaviour u Supports - all levels support multiple connections and flexible sampling/update rates (strategies) - low-level direct control and monitoring on all levels (including devices) over TCP/IP - introspection for connected devices, their monitoring points and control commands - direct telnet connection on any level, down to hardware devices for trouble shooting - even alarms, aggregate sensors, failure codes/messages and some parts of configuration are exposed on katcp as sensor u Dynamic discovery / introspection - fluid in-time detection of system through introspection of monitoring points and commands, down to device level - monitoring points includes detail like unit of measure, absolute ranges, min/ max values - introspection of command includes help and examples 32 MeerKAT CAM concepts/lessons u Device control through Proxy layer v protect access to devices v consistent M&C layer across all hardware devices and software components v katcp as close to the hardware as possible in all cases (wrap modbus, OPC, ganglia, etc in katcp controllers u Low-level / command line control through libraries & scripting interface v Powerful core and client libraries in Python serve both interactive users and other system processes with various access levels v Built-in support for exposing monitor points and commands found through introspection and auto discovery v A powerful python package, interactive user shell through iPython, as well as interface to other components in the system v Adapting to system configuration - connect to what is defined in the configuration v iPython scripting: flexible, powerful, scalable, expandable – interactive user shell and also level and component interface u Remote operations v Designed in from the start with control room in Cape Town v Most GUIs web based, portal in Karoo and in Cape Town 33 MeerKAT CAM concepts/lessons u Fully simulated system v Fully simulated system up to hardware devices and device controllers v Concurrently running a mix of simulated and real hardware devices v Allow full software development, unit and integration testing without dependency on availability of hardware v Regression testing and continuous build server use simulated system continuously – forcing 100% alignment with real world at all times u Development process – incremental deployment, maturing over cycles v Agile development and continuous incremental deployment of functionality v Early initial simple implementations maturing into full fledged functionality 34 MeerKAT CAM concepts/lessons u Homogeneous node/server management v Automated deployment to update software / patch fixes and updates on all servers v Same suite of software deployed on each node (server) with one startup service, the nodemanager v Single headnode identified as configuration server and controller v Headnode coordinates servers and controls all nodes by pushing subset of configured system to each nodemanager for launching v Each nodemanager does consistent reporting on running processes v All node control (start/restart/stop/halt/powerdown) through nodemanagers v Looking at doing some of this through VMs in future. u Adaptive system configuration v Adaptive systems and flexible configuration is required to support integration and incremental rollout. v Any combination of real and simulated devices supported in any configuration v Multiple configurations available for karoo, atp, lab, development, simulated systems, … v Powerful and flexible system configuration in human readable text files to support integration and incremental rollout v System automatically adapts to current configuration (which antennas are available) adapt connections, health displays, etc v Templated for multiples of antennas 35 MeerKAT CAM concepts/lessons u Scalability v Hierarchical and distributed monitoring for scalability - Prevent bottlenecks through design - Consistent monitoring and rolled up reporting on all levels (i.e. comms.ok, sensors.ok, unit,ok, all.ok) providing a single point to check and drill down on error - Consistent failure codes and failure reporting on all levels (i.e. failure codes and msgs) - Consistent logging on all levels - Support for multiple clients with different sampling rates (sensor strategies) - Aggregate sensors to collate information across multiple sensors into a single monitoring point - Support for multiple clients with different sampling rates v Distributed monitoring - Monitoring per node, multiple monitor components collecting distributed information - Gathers monitoring points locally and store centrally - KAT-7 and MeerKAT writes to central monitor store over network mount v Avoid network traffic bottlenecks - Antenna clusters - Distributed components - Hierarchical reporting rolled up from the bottom layers v Design for archive retrieval - Adapt design to support optimized retrieval performance 36 MeerKAT CAM development approach u KAT-7 CAM v Support for hardware integration, commissioning and low-level direct control over all components v “Productionized” the M&C architecture – reworking, rewriting, expanding, enhancing; incorporated learning from XDM, PED & Fringe Finder v Built a solid robust framework for MeerKAT CAM and prove underlying design concepts of the CAM architecture v Verified architectural components in terms of scalability, robustness, efficiency, appropriateness for MeerKAT on KAT-7 (and simulated KAT-64) v Performed a KAT-64 simulation of current monitoring architecture u Towards MeerKAT CAM v Continue with agile approach and incremental deployment v No re-write or start of MeerKAT CAM v Expand KAT-7 CAM subsystem into MeerKAT CAM 37 MeerKAT SW Development Process u System Engineering v Not too diligent with SE process on KAT-7, struggling now because of it v MeerKAT following a much better (but still streamlined) SE process – already paying off v SE investment pays off later v Drive early ICDs and consistent requirements across subsystems / components u Light-weight iterative process v Iterative approach and incremental deployment v Initial early implementations with continuous incremental deployment to mature functionality v On-line documentation – part of code base in subversion v Specification record (per component, functional area) – analyse and gather requirements, describe understanding in text format, review in the team with SE, commissioners and CAM and SP team - part of on-line documentation v Design record (per component) – a guide to the code, describe architecture, audience: new team members, engineers, commissioners and operators wanting to know a bit more – part of on-line documentation v Test driven development, continuous build server and integrated testing v Develop against the full simulated system (even before the hardware / component is ready) 38 Towards the SKA u Summary – considerations for SKA: v Standardized device/component protocol – absolute must v Heterogeneous devices/components and diversity – handle this as low as possible v Fully simulated system – absolute must v Client library & scripting interface (with discovery and introspection) for full low-level control to support early engineering v Development Process – initial early implementations with continuous incremental deployment to mature functionality v Homogeneous node management and deployment v Adaptive system configuration – design for it v Scalability – hierarchical status reporting & distributed monitoring, synchronised distributed control, archive retrieval v Get involved with ICALEPCS conference (the International Conference on Accelerator and Large Experimental Physics Control Systems) and tap into an impressive body of knowledge with a Monitoring and Control focus www.icalepcs.org http://icalepcs2011.esrf.eu/ 39 What we have KAT-7 CAM u Covers core requirements of MeerKAT control and monitoring: v full low-level control and monitoring functionality with engineering interfaces v some operational control and monitoring functionality and interfaces v reasonable support for remote operations (including alarms and SMS notifications) v archive access and browsers v various engineering and commissioning displays, not yet developed GUIs for operators v full manual control & scripting, no scheduling or subarraying yet u Succeeded in establishing: v the benefits of a standardized communications protocol (katcp) on all levels v a flexible and adaptive system configuration through introspection to support engineering, commissioning and incremental roll-out v a fully simulated system up to hardware devices and device controllers, running concurrently with real hardware devices v an interactive user shell using iPython through a powerful command line user library v a solid CAM framework that is robust and tested; and ready for expansion v agile process with continuous incremental deployment 40 To do for MeerKAT CAM u Operational control of array v Features required by an operational instrument and better support for remote operations v Specification and implementation of the kat subarray manager u Specification and implementation of the observation framework v including a simple scheduler, task executor, authorization & authentication, noting data products, operator logs and observation reports, etc u Proposal Management Tool and Observation Planning Tool v Hope to adopt/adapt existing tools used by other telescopes for these / or parts of these u Scalable User Interfaces v GUIs for operators and scientists v Enhanced client library, access levels and user interfaces v Working with HCI experts from universities for design inputs 41 Some M&C challenges for SKA u Standardized protocol § Specify it really early and get it right !!! § For hardware, devices, software components § Standardize interfacing AND behaviour u Scalability § Obviously in processes, architectural components, networking § But also things like user interfaces, archive retrieval § Managing heterogeneous devices/components and diversity - as low as possible § Hierarchical components - Design layers of aggregation § Each level do local monitoring and rolled up status reported up the hierarchy u Identifying Scope § Specifying the scope of M&C carefully § Interfacing to other S&C components & feedback loops required from signal path § Can limit the numbers of M&C by specifying down to sub-components – e.g. allowed nr of sensors, specify logging and exception behaviour, specify interfaces and common behaviour u Distributed monitoring § Distributed monitor store § Central API for accessing historical data § Monitoring points – how many, how often, how to split storage, careful design of access to distributed storage u Distributed control § Design for synchronized distributed execution § Common API for control and state feedback 42 Some M&C challenges for SKA u Access to archived monitoring data § For fault finding, debugging, post mortem and trend analysis § Needs careful design u No-one has built a telescope the scale of the SKA before § Don’t expect to get the requirements right the first time § Allow and plan for it u Multiple views and M&C flexibility § Design generic mechanisms so pockets of M&C can group and roll-up into different views e.g. health of a subarray, health of a region, health of a specific station § Design generic mechanisms eg. for rolled-up reporting, drill-down & interrogation, § Each M&C component to be flexible in slotting into various views/roles, knowing how to report for different parents, even knowing how to present itself, etc u Incremental implementation and roll-out § No-one has built a telescope the scale of the SKA before – don’t expect to get the requirements right the first time § Coordinating chunks of functionality between teams § Identifying scope and boundaries to fit § Timing of delivery between subsystems § Feedback loops for user/commissioner inputs to mature initial implementations to prevent continuous scope creep and uncontrolled refactoring u Continuously changing system configuration § New receptors rolled out continuously § Adapt layers of aggregation 43 Thank you! Questions? [email protected] 44 http://www.ska.ac.za44