Abstract
We introduce the strictly in-order core (SIC), a timing-predictable pipelined processor core. SIC is provably timing compositional and free of timing anomalies. This enables precise and efficient worst-case execution time (WCET) and multi-core timing analysis. SIC’s key underlying property is the monotonicity of its transition relation w.r.t. a natural partial order on its microarchitectural states. This monotonicity is achieved by carefully eliminating some of the dependencies between consecutive instructions from a standard in-order pipeline design. We present a formal proof framework based on satisfiability modulo theories that is able to automatically verify SIC’s timing predictability. SIC preserves most of the benefits of pipelining: it is only about 6–7% slower than a conventional non-strict in-order pipelined processor. Its timing predictability enables orders-of-magnitude faster WCET and multi-core timing analysis than conventional designs.
Similar content being viewed by others
References
Altmeyer S, Davis RI, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: Proceedings of the 32nd IEEE real-time systems symposium, RTSS 2011, Vienna, Austria, November 29–December 2, 2011, pp 261–271
Altmeyer S, Davis RI, Indrusiak LS, Maiza C, Nélis V, Reineke J (2015) A generic and compositional framework for multicore response time analysis. In: Proceedings of the 23rd International Conference on real time networks and systems, RTNS 2015, Lille, France, November 4–6, 2015, pp 129–138
Berg C, Engblom J, Wilhelm R (2004) Requirements for and design of a processor with predictable timing. In: Perspectives workshop: design of systems with predictable behaviour, ser. Dagstuhl Seminar Proceedings. Thiele L, Wilhelm R (Eds.) no. 03471. Dagstuhl, Germany: Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany
Cousot P, Cousot R (1977) Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Conference record of the fourth ACM symposium on principles of programming languages, Los Angeles, California, USA, January 1977, pp 238–252
Dasari D, Andersson B, Nelis V, Petters S, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: 2011 IEEE 10th international conference on trust, security and privacy in computing and communications (TrustCom), pp 1068–1075
Davis RI, Altmeyer S, Indrusiak LS, Maiza C, Nelis V, Reineke J (2018) An extensible framework for multicore response time analysis. Real-Time Syst 54(3):607–661
de Dinechin BD, van Amstel D, Poulhiès M, Lager G (2014) Time-critical computing on a single-chip massively parallel processor. In: DATE, pp 1–6
de Moura LM, Bjørner N (2008) Z3: an efficient SMT solver. In: Tools and algorithms for the construction and analysis of systems, 14th international conference, TACAS 2008, held as part of the joint European conferences on theory and practice of software, ETAPS 2008, Budapest, Hungary, March 29–April 6, 2008. Proceedings, pp 337–340
Degasperi P, Hepp S, Puffitsch W, Schoeberl M (2014) A method cache for patmos. In: 17th IEEE international symposium on object/component/service-oriented real-time distributed computing, ISORC 2014, Reno, NV, USA, June 10–12, 2014. IEEE Computer Society, pp 100–108
Edwards SA, Lee EA (2007) The case for the precision timed (PRET) machine. In: Proceedings of the 44th design automation conference, DAC 2007, San Diego, CA, USA, June 4–8, 2007. IEEE, pp 264–265
Engblom J, Jonsson B (2002) Processor pipelines and their properties for static WCET analysis. In: Embedded software, second international conference, EMSOFT 2002, Grenoble, France, October 7–9, 2002, Proceedings, ser. Lecture Notes in Computer Science, Sangiovanni-Vincentelli AL, Sifakis J (eds) vol. 2491. Springer, pp 334–348
Falk H, Altmeyer S, Hellinckx P, Lisper B, Puffitsch W, Rochange C, Schoeberl M, Sorensen RB, Wägemann P, Wegener S (2016) TACLeBench: A benchmark collection to support worst-case execution time research. In: 16th international workshop on worst-case execution time analysis, WCET 2016, July 5, Toulouse, France, 2016, pp 2:1–2:10
Ferdinand C, Wilhelm R (1999) Efficient and precise cache behavior prediction for real-time systems. Real-Time Syst 17(2–3):131–181
Giannopoulou G, Lampka K, Stoimenov N, Thiele L (2012) Timed model checking with abstractions: towards worst-case response time analysis in resource-sharing manycore systems. In: EMSOFT. ACM, pp 63–72
Gustavsson A, Ermedahl A, Lisper B, Pettersson P (2010) Towards WCET analysis of multicore architectures using UPPAAL. In: WCET, B. Lisper, Ed., vol. 15, Dagstuhl, Germany, pp 101–112
Hahn S, Reineke J (2018) Design and analysis of SIC: a provably timing-predictable pipelined processor core. In: 2018 IEEE real-time systems symposium, RTSS 2018, Nashville, TN, USA, December 11–14, 2018, pp 469–481
Hahn S, Reineke J, Wilhelm R (2015) Towards compositionality in execution time analysis: definition and challenges. SIGBED Rev 12(1):28–36
Hahn S, Jacobs M, Reineke J (2016) Enabling compositionality for multicore timing analysis. In: Proceedings of the 24th international conference on real-time networks and systems, RTNS 2016, Brest, France, October 19–21, 2016, pp 299–308
Hahn S, Reineke J, Wilhelm R (2015) Toward compact abstractions for processor pipelines. In: Correct system design—symposium in honor of Ernst-Rüdiger Olderog on the Occasion of His 60th Birthday, Oldenburg, Germany, September 8–9, 2015. Proceedings, pp 205–220
Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach, 5th edn. Morgan Kaufmann, Burlington
Huang W, Chen J, Reineke J (2016) MIRROR: symmetric timing analysis for real-time tasks on multicore platforms with shared resources. In: Proceedings of the 53rd annual design automation conference, DAC 2016, Austin, TX, USA, June 5–9, 2016. ACM, pp 158:1–158:6
Kelter T (2015) “WCET analysis and optimization for multi-core real-time systems. Ph.D. dissertation, TU Dortmund University
Kelter T, Marwedel P (2014) Parallelism analysis: precise WCET values for complex multi-core systems. In: Formal techniques for safety-critical systems—third international workshop, pp 142–158
Lampka K, Giannopoulou G, Pellizzoni R, Wu Z, Stoimenov N (2014) A formal approach to the WCRT analysis of multicore systems with memory contention under phase-structured task sets. Real-Time Syst 50(5):736–773
Li YS, Malik S (1995) Performance analysis of embedded software using implicit path enumeration. In: Proceedings of the ACM SIGPLAN 1995 workshop on languages, compilers, & tools for real-time systems (LCT-RTS 1995). La Jolla, California, June 21–22, 1995, pp 88–98
Liu I, Reineke J, Broman D, Zimmer M, Lee EA (2012) A PRET microarchitecture implementation with repeatable timing and competitive performance. In: 30th International IEEE conference on computer design, ICCD 2012, Montreal, QC, Canada, September 30–October 3, 2012. IEEE Computer Society, pp 87–93
Lundqvist T, Stenström P (1999) Timing anomalies in dynamically scheduled microprocessors. In: Proceedings of the 20th IEEE real-time systems symposium, Phoenix, AZ, USA, December 1–3, 1999, pp 12–21
Lv M, Guan N, Reineke J, Wilhelm R, Yi W (2016) A survey on static cache analysis for real-time systems. Leibniz Trans Embed Syst 3(1):05
Lv M, Yi W, Guan N, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: Proceedings of the 2010 31st IEEE real-time systems symposium, pp 339–349
Micron Technology, Inc. Automotive DDR SDRAM MT46V32M8, MT46V16M16. https://www.micron.com//media/documents/products/data-sheet/dram/mobile-dram/low-power-dram/lpddr/256mb_x8x16_at_ddr_t66a.pdf
Müller SM, Paul WJ (2000) Computer architecture: complexity and correctness. Springer, Berlin
Pellizzoni R, Schranzhofer A, Chen J-J, Caccamo M, Thiele L (March 2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in Europe Conference Exhibition (DATE), 2010, pp 741–746
Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: bank privatization for predictability and temporal isolation. In: Dick RP, Madsen J (eds) Proceedings of the 9th international conference on hardware/software codesign and system synthesis, CODES+ISSS 2011, part of ESWeek ’11 seventh embedded systems week, Taipei, Taiwan, 9–14 October, 2011, ACM, pp 99–108
Reineke J, Wachter B, Thesing S, Wilhelm R, Polian I, Eisinger J, Becker B (July 2006) A definition and classification of timing anomalies. In: Proceedings of 6th international workshop on worst-case execution time (WCET) analysis
Schliecker S, Ernst R (2011) Real-time performance analysis of multiprocessor systems with shared memory. ACM Trans Embed Comput Syst 10(2):22:1–22:27
Schoeberl M, Abbaspour S, Akesson B, Audsley NC, Capasso R, Garside J, Goossens K, Goossens S, Hansen S, Heckmann R, Hepp S, Huber B, Jordan A, Kasapaki E, Knoop J, Li Y, Prokesch D, Puffitsch W, Puschner PP, Rocha A, Silva C, Sparsø J, Tocchi A (2015) T-CREST: time-predictable multi-core architecture for embedded systems. J Syst Architect Embed Syst Des 61(9):449–471
Schoeberl M, Puffitsch W, Hepp S, Huber B, Prokesch D (2018) Patmos: a time-predictable microprocessor. Real-Time Syst 54(2):389–423
Schranzhofer A, Chen J-J, Thiele L (2010) Timing analysis for TDMA arbitration in resource sharing systems. In: Proceedings of the 2010 16th IEEE real-time and embedded technology and applications symposium, pp 215–224
Schranzhofer A, Pellizzoni R, Chen J-J, Thiele L, Caccamo M (2011) Timing analysis for resource access interference on adaptive resource arbiters. In: Proceedings of the 2011 17th IEEE real-time and embedded technology and applications symposium, pp 213–222
Thiele L, Wilhelm R (2004a) Design for timing predictability. Real-Time Syst 28(2–3):157–177
Thiele L, Wilhelm R (2004b) 03471 abstracts collection—design of systems with predictable behaviour. In: Perspectives workshop: design of systems with predictable behaviour, ser. Dagstuhl Seminar Proceedings, Thiele L, Wilhelm R (eds) no. 03471. Dagstuhl, Germany: Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany
Touzeau V, Maïza C, Monniaux D, Reineke J (2019) Fast and exact analysis for LRU caches. Proc ACM Program Lang 3(POPL):54:1–54:29
Ungerer T, Cazorla FJ, Sainrat P, Bernat G, Petrov Z, Rochange C, Quiñones E, Gerdes M, Paolieri M, Wolf J, Cassé H, Uhrig S, Guliashvili I, Houston M, Kluge F, Metzlaff S, Mische J (2010) Merasa: multicore execution of hard real-time applications supporting analyzability. IEEE Micro 30(5):66–75
Ungerer T, Bradatsch C, Frieb M, Kluge F, Mische J, Stegmeier A, Jahr R, Gerdes M, Zaykov PG, Matusova L, Li ZJJ, Petrov Z, Böddeker B, Kehr S, Regler H, Hugl A, Rochange C, Ozaktas H, Cassé H, Bonenfant A, Sainrat P, Lay N, George D, Broster I, Quiñones E, Panic M, Abella J, Hernández C, Cazorla FJ, Uhrig S, Rohde M, Pyka A (2016) Parallelizing industrial hard real-time applications for the parMERASA multicore. ACM Trans Embed Comput Syst 15(3):53:1–53:27
Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley DB, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F, Puaut I, Puschner PP, Staschulat J, Stenström P (2008) The worst-case execution-time problem: overview of methods and survey of tools. ACM Trans Embed Comput Syst 7(3):36:1–36:53
Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans CAD Integr Circuits Syst 28(7):966–978
Zimmer M, Broman D, Shaver C, Lee EA (2014) FlexPRET: A processor platform for mixed-criticality systems. In: 20th IEEE real-time and embedded technology and applications symposium, RTAS 2014, Berlin, Germany, April 15–17, 2014. IEEE Computer Society, pp 101–110
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the PEP Project - 289264719 and by the Saarbrücken Graduate School of Computer Science which receives funding from the DFG as part of the Excellence Initiative of the German Federal and State Governments.
Appendix: Proof of monotonicity
Appendix: Proof of monotonicity
Here, we provide the proofs for the lemmas stated in the main section that can all be proven by case distinction of the cycle behavior relation.
In the process of proving monotonicity of the strictly in-order pipeline, we use the following, rather technical lemma.
Lemma 3
(Update enable) Let a, b be two configurations. Furthermore, let \(i \in {\mathcal {I}}\) be an instruction with equal progress in a and b\((a(i) = b(i))\) and all previous instruction \(j < i\) have progressed more in a than b\((a(j) \sqsupseteq _{\mathcal {P}}b(j))\). For any given valuation of the free variables in ready, if b advances to the next pipeline stage, a advances as well:
Proof
\({\underline{ready}}\)
Let \(b.\textit{ready}(i)\). There are two cases. In the first case, \(b.\textit{stage}(i) = MEM \) and \( opc (i) = store \). As \(a(i) = b(i)\), it follows that \(a.\textit{stage}(i) = MEM \) and thus \(a.\textit{ready}(i)\). In the second case, we have \(b.\textit{cnt}(i) = 0\) and by \(a(i) = b(i)\) also \(a.\textit{cnt}(i) = 0\). For all pipeline stages except pre, \( ID \) and \( EX \) this is sufficient to conclude \(a.\textit{ready}(i)\). We prove the claim for \( EX \), \( ID \) and pre each by contradiction.
For stage \( EX \), \(\lnot a.\textit{ready}(i)\) implies that i is either a store instruction or a load instruction that misses the data cache, while a store is pending. If \(a. stpending (i)\), there is a store instruction \(j < i\) with \(a(j) \sqsubset _{\mathcal {P}}( ST ,0)\). By assumption, we know that \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\). By transitivity, \(b(j) \sqsubset _{\mathcal {P}}( ST ,0)\) and thus \(b. stpending (i)\). It follows, that \(\lnot b.\textit{ready}(i)\) which is a contradiction.
For stage \( ID \), \(\lnot a.\textit{ready}(i)\) requires an operand hazard \( ophaz (i)\). This means, there is a load instruction \(j < i\) with \(a(j) \sqsubset _{\mathcal {P}}( MEM , 0)\) that writes our operand. By \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\), we conclude that \(b(j) \sqsubset _{\mathcal {P}}( MEM , 0)\), i.e. there is an operand hazard in b. This contradicts \(b.\textit{ready}(i)\).
For stage pre, \(\lnot a.\textit{ready}(i)\) requires either \( brpending (i)\), \( mempending (i) \wedge \lnot ichit (i)\), or \(\lnot \textit{next}(i)\). In all three cases an argument analogous to \( ophaz \) applies. If there is a branch j pending in a, j is also a pending branch in b as \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\). If there is an older instruction \(j < i\) to be fetched next, j is also in the pre stage in b and is to be fetched next in b. If a memory operation j is pending in a, j is also a pending memory operation in b as \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\). Thus, if any of the three expressions above evaluates to true for a, it also evaluates to true for b resulting in \(\lnot b.\textit{ready}(i)\). This is a contradiction.
This concludes the proof for ready.
\({\underline{willbefree}}\)
Let \(s = b.\textit{stage}'(i)\) and \(b.\textit{willbefree}(s)\). If \(s = post \), \(a.\textit{willbefree}(s)\) follows by definition of \(\textit{willbefree}\). If s is empty in b, i.e. no \(j<i\) is in s, s must also be empty in a since \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\).
Otherwise a \(j<i\) is in s such that \(b.\textit{ready}(j)\) and \(b.\textit{willbefree}(b.\textit{stage}'(j))\). Since \(a(j) \sqsupseteq _{\mathcal {P}}b(j)\), either \(a(j) = b(j)\) or \(a(j) \sqsupset _{\mathcal {P}}b(j)\). In the latter case, j in a is already in a stage further than s and consequently \(a.\textit{willbefree}(s)\) since stage s must be free in a. In the other case, \(a(j) = b(j)\), we can inductively use this Lemma 3 for j which is in a later stage than i. We repeat this argument until we hit either a free stage or stage post. By applying the induction hypothesis, i.e. Lemma 3 for j, we get \(a.\textit{ready}(j)\) and \(a.\textit{willbefree}(b.\textit{stage}'(j))\). This results in \(a.\textit{willbefree}(s)\). \(\square \)
Proof of Lemma 1
The progress of an instruction depends on the progress of other instructions exclusively via \(\textit{ready}\) and \(\textit{willbefree}\). By the proof of Lemma 3, we know that \(\textit{ready}\) and \(\textit{willbefree}\) solely depend on the progress of previous instructions.\(\square \)
Proof of Lemma 2
First, we prove \(c \sqsubseteq cycle (c)\) by case distinction of \( cycle \). We denote \( cycle (c)\) by \(c'\) for short. Let an instruction \(i\in {\mathcal {I}}\) be given. If \(c'(i)\) is stalled, \(c'(i) = c(i)\) and thus \(c(i) \sqsubseteq _{\mathcal {P}}c'(i)\). If \(c'(i)\) reduces the number of remaining cycles, \(c'(i) = (c.\textit{stage}(i), c.\textit{cnt}(i) - 1)\) and thus \(c(i) \sqsubset _{\mathcal {P}}c'(i)\). If \(c'(i)\) advances to the next pipeline stage, \(c.\textit{stage}(i) \sqsubset _{\mathcal {S}} c'.\textit{stage}(i)\) and thus \(c(i) \sqsubset _{\mathcal {P}}c'(i)\).
To prove the strictness of \(c \sqsubset c'\), it is sufficient to show that not every instruction is stalled in the pipeline. We will show that the instruction farthest down the pipeline is not stalled. Let instruction i be the farthest instruction, i.e. all instructions \(j < i\) already left the pipeline. All stages below \(c.\textit{stage}(i)\) are empty, which results in \(\textit{willbefree}(c.\textit{stage}'(i))\). If \(c.\textit{cnt}(i) > 0\), i is not stalled as the number of remaining cycles is reduced. If \(c.\textit{cnt}(i) = 0\), \(c.\textit{ready}(i)\) and thus i would progress to the next stage. Even if the current stage of i is \( EX \), \( ID \), or pre, the readiness of i cannot be prevented from operand hazard or pending branches/memory operations as the pipeline in front of i is empty.\(\square \)
Proof of Theorem 1
Let c, d be given such that \(c \sqsubseteq d\). We need to prove that \( cycle (c) \sqsubseteq cycle (d)\). We denote \( cycle (c)\) by \(c'\) and \( cycle (d)\) by \(d'\) for short.
For every \(i \in {\mathcal {I}}\), we need to show that \(c'(i) \sqsubseteq _{\mathcal {P}}d'(i)\). We distinguish three possible cases of \( cycle \) applied to c: (1) i is stalled, (2) i counts down its remaining cycles, or (3) i advances to the next pipeline stage.
Pipeline stall
If instruction i is stalled in configuration c, we obtain \(c'(i) = c(i)\). By assumption, we know \(c'(i) = c(i) \sqsubseteq _{\mathcal {P}}d(i)\). By Lemma 2, we conclude that \(c'(i) \sqsubseteq _{\mathcal {P}}d'(i)\).
Remaining cycles countdown
If instruction i reduces its remaining cycles in c, we obtain \(c.\textit{stage}(i) = c'.\textit{stage}(i)\) and \(c'.\textit{cnt}(i) = c.\textit{cnt}(i) - 1\). If \(c(i) = d(i)\), by definition of \( cycle \) we obtain \(c'(i) = d'(i)\). Otherwise, \(c(i) \sqsubset _{\mathcal {P}}d(i)\):
If \(c.\textit{stage}(i) \sqsubset _{\mathcal {S}} d.\textit{stage}(i)\), we conclude by Lemma 2 that \(c'.\textit{stage}(i) \sqsubset _{\mathcal {S}} d'.\textit{stage}(i)\) and so \(c'(i) \sqsubset _{\mathcal {P}}d'(i)\).
Otherwise \(c.\textit{cnt}(i) > d.\textit{cnt}(i)\).
If \(d.\textit{cnt}(i) = 0\), either i advances in the pipeline resulting in \(c'.\textit{stage}(i) \sqsubset _{\mathcal {S}} d'.\textit{stage}(i)\) or i is stalled in d resulting in \(d'.\textit{cnt}(i) = d.\textit{cnt}(i) = 0 \le c.\textit{cnt}(i) - 1 = c'.\textit{cnt}(i)\).
If \(d.\textit{cnt}(i) \ne 0\), by definition of \( cycle \) we conclude \(c'.\textit{cnt}(i) = c.\textit{cnt}(i) - 1 < d.\textit{cnt}(i) - 1 = d'.\textit{cnt}(i)\).
Pipeline stage advance
We know that \(c(i) \sqsubseteq _{\mathcal {P}}d(i)\), i.e. either \(c(i) = d(i)\) or \(c(i) \sqsubset _{\mathcal {P}}d(i)\). We consider the case \(c(i) = d(i)\) first. As we are in the pipeline stage advance case, we know \(c.\textit{ready}(i)\) and \(c.\textit{willbefree}(c.\textit{stage}'(i))\). By using Lemma 3, we get \(d.\textit{ready}(i)\) and \(d.\textit{willbefree}(c.\textit{stage}'(i))\). Consequently, we know that i will also advance its pipeline stage in d which results in \(c'.\textit{stage}(i) = d'.\textit{stage}(i)\). In the second case, \(c(i) \sqsubset _{\mathcal {P}}d(i)\), we know \(c.\textit{stage}(i) \sqsubset _{\mathcal {S}} d.\textit{stage}(i)\) since we are in the pipeline stage advance case. By definition of \(\textit{stage}'\), i can at most move to the consecutive stage and thus either \(c'.\textit{stage}(i) \sqsubset _{\mathcal {S}} d'.\textit{stage}(i)\) or \(c'\textit{stage}(i) = d'.\textit{stage}(i)\).
To conclude \(c'(i) \sqsubseteq _{\mathcal {P}}d'(i)\), it remains to be proven that \(c'.\textit{cnt}(i) \ge d'.\textit{cnt}(i)\) if \(c'.\textit{stage}(i) = d'.\textit{stage}(i)\). Immediately after the pipeline advance, \(c'.\textit{cnt}(i)\) is the latency determined by \( latency (i)\) and thus \(d'.\textit{cnt}(i)\) cannot be higher because the number of remaining cycles is never increased during \( cycle \). \(\square \)
Rights and permissions
About this article
Cite this article
Hahn, S., Reineke, J. Design and analysis of SIC: a provably timing-predictable pipelined processor core. Real-Time Syst 56, 207–245 (2020). https://doi.org/10.1007/s11241-019-09341-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-019-09341-z