Introduction

Alan Turing outlined his ideas and proposed equations supporting them in his original paper ‘The chemical basis of morphogenesis’ in 1952 [18]. In his paper, a complex pattern resembling an animal’s fur pattern is generated by a mathematical formula, and such patterns are related to the Turing structure. Actually, simple rules can generate complex phenomena that are even similar to life in nature. As is shown in Fig. 1, turbulence can be generated by convection [9], and the pattern similar to the shell can be generated by rule 30th with the cellular automaton [20]. Similar to complexity, the human may learn to play some games fast without many trains after knowing the rules. Therefore, we wonder if artificial intelligence can be generated based on simple rules? Can such intelligence adapt to out-of-sample conditions? (Fig. 1).

Fig. 1
figure 1

Examples of the complex phenomenon and simple rules. Turbulence seems to be very chaotic, it can be generated based on simple rules. We simulated the process and generated it by convection between cold and heat. a and b are the cases of turbulence in a plane and a cup. The pattern generated by the iteration of the simple rule 30th in the cellular automaton in c is similar to the pattern on the shell in d. Sparse coding and IFS(iterating function system) also show that the basic abstract features that can constitute complex phenomena

Recently, statistic learning methods boast notable capabilities. However, these models need much data tab with complete information reflecting the system’s characteristics [4, 10, 17]. In addition, the models are hard to interpret, their function is single and they cannot predict in the out-of-sample conditions [8, 12]. For example, the model is hard to interpret, because it is a ‘black box’ [12, 14, 16]. The structure and working process inside the ‘black box’ are unknown. The structure is important for the network [6], because data with different characteristics need networks with different structures, CNN is good at processing image and RNN is good at time sequence. It is difficult to learn data with different features well by a common ‘black box’ without knowing the inside of the box.

In contrast, the rule-based models have better interpretability. However, rules directly learned from data by statistical learning may lack mathematical logic [13, 21]. Our method is based on ‘DNA rules’ which are different from the rules learned by statistics, because they are real causalities that exist in the system. In addition we call them ‘DNA rules’ for they are like DNA on some aspect. Such a model can be interpreted by the causality. A biological mutation experiment showed that mutations considered to be random have the same mutation results under the same conditions in several experiments [2], which imply that there are certainty and causality even in complicated phenomena. The method we propose is based on a similar view, we believe there are causality and certainty between the causes and results and the causality can be mined, then we can take the causality to deductive and generate the evolution process of the system. This process is logically strict with certainty. The model can be interpreted by causality. In addition, it can generate a complete attractor corresponding to all the possible states in the system’s evolution. Therefore, comparing with statistical models, the model can also work well in out-of-sample conditions. With the attractor, the model is more complete and has better adaptability to changes. In addition, the causality here not only works well in the traditional deterministic systems like the stable point and limited circle but also can work in some seemingly random and systems which are considered to be stochastic systems in the traditional statistical method. The method shows that these seemingly random systems can be mined, learned, and predicted in the short term. The method can be applied in some control and predicting in some complex and nonlinear evolutionary systems and could be an enrichment to the current model.

The rest of this paper has been organized into 4 sections. Section “DNA rules” presents the concepts of DNA rules. Section “Our method” shows our method and related concepts in detail. Section “Causality can be obtained” shows experiments based on the causality-based method. Lastly, Section “Experiment” discusses and outlines the advantages of the model. Conclusions and some inferences are also proposed. The method is proved to be theoretically feasible in the supplement and the properties of the model are proved.

DNA rules

DNA rules are the basic rules. They are the causal relationship in the system rather than the rules learned by statistical methods. Statistical rules may be wrong [12], because the data may mislead the model, but the causality of the system itself must be right. The DNA rules correspond to the system’s complete hypothesis space of the solution (Theorem 3 in Supplement) and can generate the attractor of the system which corresponding to the possible evolution process of the system. In contrast, the statistical rules may not be the basic rules, and their hypothesis space is not complete because of the constraints of the rules. Taking building blocks to compare these two rules. DNA rules are the basic building blocks, and the traditional statistical rules are blocks that have been shaped and fixed. It is clear the basic building blocks can make more shapes. Table 1 lists the main differences between DNA rules and traditional rules. The model based on DNA rules not only has the advantage of being interpretable but also has better adaptability to changes than the data-driven model [8].

Table 1 Main differences between DNA rules and traditional rules

Our method

The model we propose is based on the belief that the evolution of the systems is based on causality. The causality is composed of goals and DNA rules. The goal is a fixed and general one that applies regardless of the state of the system. For example, the principle of generalized energy minimization that the system moves in a manner of “minimum energy consumption” in generalized motion, like following the gradient direction, getting more awards. In addition, the DNA rules are the basic causal rules of the system. Since the goal and DNA rules are fixed, the causality is fixed, then the process of system evolution is logically strict and deterministic. From the perspective of system evolution,such a causality-based evolution process for the above system can be expressed by Formula (1), where X is the states of the system, A is the deterministic laws in the system, B is the external input from the environment. In the formula, the deterministic laws are composed of the DNA rules and the general fixed goal, they are the laws of nature that exist in the system. A is deterministic and B is uncertain:

$$ \dot{X} \, = \, AX{ + }B. $$
(1)

From the phase-space point of view, the method can generate attractors based on the iterative process with the causality [5, 15, 18], the Lorentz attractor in Fig. 2 is an example. The attractor is a set of numerical values toward which a system tends to evolve, for a wide variety of starting conditions of the system [1]. It reflects the system’s evolving process and shows a range, where the system is more stable. According to Theorem 1, the attractor can be classified by Lyapunov exponent.

Fig. 2
figure 2

DNA rules and the DNA. The DNA defines life, and the DNA rules in figure define the Lorentz system. With the DNA rules, the evolution process of the Lorentz system can be accurately generated in the form of an attractor. Such a rule is different from the rules in the expert system. It is a rule with an interpretable causal relationship. At the same time, systems based on DNA rules are highly adaptable. Just as twins grow up in different environments behave differently because of adaptation

Theorem 1

In mathematics, the Lyapunov exponent of a dynamical system is a quantity that characterizes the rate of separation of close orbits [1]. The orbits are the trajectories of the system’s evolution in phase space.

For the discrete system,the average Lyapunov exponent is given by Formula (2) [3], where the λ is the average Lyapunov exponent in function (2), \( f \) represents the iteration process and n represents the times of iteration:

$$ \left( {\varvec{x}_{0} } \right) = \mathop {\lim }\limits_{{{\varvec{n}} \to \infty }} \frac{1}{{\varvec{n}}}\mathop \sum \limits_{i = 0}^{n - 1} { \ln }\left| {f^{\prime}\left( {x_{i} } \right)} \right|. $$
(2)

The direction of the multiple initial separation vectors of the system is different, and the separation rate is also different. Among them, the maximal Lyapunov exponent (MLE) determines the predictability of the system [19]. When the Lyapunov exponent in all direction is negative, the attractor is a stable attractor, such a system is easy to predict for the system converges to the stable point. When all the Lyapunov exponent is greater than zero, the system is divergent and cannot be further predicted. When the MLE is greater than zero, and the system is not infinitely divergent, which means the system also has the Lyapunov exponent less than zero or equal to zero, the system has a strange attractor. The classification is shown in Table 2.

Table 2 Classify systems according to the Lyapunov index

Because the MLS is positive, the orbits in the attractor of the system separate. Therefore, the system will leave from the current orbit when there are external interferences, which means the attractor is sensitive to the changes and orbits are unstable. In addition, the orbits also converge to a certain range in the phase space. Therefore, the attractor has dense orbits in that certain range. These features cause the evolution of the system to seem to be random and uncertain especially when there is an interference, for there seem to be many possibilities for the orbital migration.

However, there are certainties in this seemingly random process. The DNA rules and goals are fixed, so, the causality is fixed. With the constraints of causality, the attractor of the system and its orbits are determined. Even there is interference, the process of system evolution is still subjected to causality. System attractors can be obtained based on causality or data reconstruction methods. The process of attractor generation also depicts the process that the system changes the unstable orbits because of external interferences. In addition, the attractor is coupling with the interference by the fixed goal. The more time of iteration, the more the method can get the approximate shape and orbits of the attractor. It needs emphasis that we do not need to traverse all the points of the attractor, an approximate shape of the attractor and some of its inner orbits are enough. System values that get close enough to the attractor values remain close even if slightly disturbed. In addition, because the causalities are fixed, there are fractal features in the attractor [1], we can also complement the attractors with fractal features. Therefore, the method can also work for large size problems.

System attractor is a prerequisite for complex decision making. Complex decision making is learning, adapting, and predicting complex behaviors. The process of generating attractors is the process of learning the system. In addition, there are two constraints and a fixed goal that promise the model can learn the system successfully and effectively predict and adapt. The first constraint is that the Lyapunov exponent is greater than zero but not infinity. The Lyapunov exponent reflects the rate of separation of orbits is limited, which means system values that get close enough to the attractor values remain close even if slightly disturbed. The second constraint is we know the current state of the system. In addition, the goal is fixed. In a system evolving under interference, the fixed goal the law of the passive response of the system to external interference, and it drives the system to adapt during the iterative evolution. For example, the goal can be following the gradient direction, getting more awards, they are essentially following the lowest-energy principle. Because the goal is fixed and the attractor is sensitive to changes, the process of adapting to changes based on the goal can be done spontaneously. Compared to other models, the external interferences may be fatal for statistical models; however, the instability of the attractor makes the model adaptable in such a method. In addition, according to the fixed goal, the responses of the system to interferences are recorded, which is the orbital migration in the generated attractor. Furthermore, the local manifold can be constructed based on the system’s current position in the attractor. Short-term prediction can be realized on the local attractor manifold for the rate of separation of orbits is not infinity (Theorems 1, 5 and 6 in Supplement).

The method of this paper is main according to the above theory, and it mainly focuses on the above nonlinear evolutionary system which may seem random with a strange attractor. The DNA rules can be expressed in two forms: constraint rules and affirmative rules. These two forms can be converted to each other. The hypothesis space of the attractor of the system is subjected to DNA rules (Theorem 3 in Supplement). The goal works based on the system’s perception of external interferences and then drives the system to adapt to external conditions. The water may be a great example, although the shape of the water is complex and diverse, it flows following the gradient direction. As a result, in different environments, it can adapt and generate ever-changing shapes. We believe that causality is fixed in the process of system evolution although the external conditions may always change. The changes in the conditions, the system’s state, and evolution are coupled by the causality. Therefore, external changes will cause and drive the system to change and evolve. Although the results may look very complex and random in terms of timing, this problem may be solved on the phase-space. The attractor converges to a certain range with certainty, its inner orbits are also fixed although dense. Even in the case of external interferences and the system may leave from the current orbit of the attractor to a new one because of the sensitivity, short-term prediction can still be achieved by attractor manifolds. Therefore, the method is to construct a system to generate the strange attractors and to construct an intelligent model with properties of attractors.

Compared with the statistical methods, this method is not to model the data but to reconstruct the system’s causality and attractor. So that model can make logically rigorous and precise predictions. It may work in some conditions that traditional ML models cannot work well [8], because it is based on causality and has the complete attractor corresponding to the hypothesis space of the whole system. In contrast, statistical learning methods assume the model parameters to be fixed and mine the statistical relationship in data. Such a model may not be sensitive to small probability problems. In addition, the rule that the model learned from data is not DNA rules, and it is fixed. In fact, the parameters may vary in time. In addition, the statistical correlation obtained by statistical learning may not exist after the change.

As is the architecture of the method shown in Fig. 3, the DNA rules and the goal can be obtained to compose the causality. In addition, the causality can also be learned and mined from the data. Then, we can take the network, decision tree, formula, or other models to learn the mined causality, and then taking advantage of the fast iterative speed of the computer to generate the attractor based on the causality. In addition, such a system will be adaptable to changes and can be predicted in the short term. In addition, it needs to emphasize that the network in this model only learns the mined basic causal rules and causality instead of mining the statistical correlation between input and output. Although the inside of the network is a ‘black box’, the whole system is still interpreted for the relationship between input and output is the causality. The comparison of proposed method and the traditional statistical method is shown in Fig. 4.

Fig. 3
figure 3

System architecture and viewpoint of the proposed method. The theorem used and proposed in the paper is also listed. DNA rules are the basic evolution causalities of the system. According to Theorem 2, for some system, like the go game, DNA rules can be obtained from the definition. The causalities are composed of the basic evolution rules and the fixed general evolution goal, like win the game. According to Theorem 3, based on the causalities, the attractor manifold and evolution of the system can be generated by iteration. The attractor is a complete set of orbits that corresponding to the evolution of the system in the phase space. From this view, the time-series data of the system is a part of the attractor observed from a certain dimension. The main causalities of the system can be mined and learned date, according to the Theorem 4, the attractor manifold can also be reconstructed from the time-series data when we cannot directly get the DNA rules to generate attractor of the system. Then according to the attractor, the complex decisions can be made. According to Theorem 1, 5, 6, a system based on the attractor is predictable, so the model constructed by this method can be used to make predictions. In addition, According to Theorem 1, the model based on attractor can learn complex behaviors and adapt to change, which makes the model based on the attractor adaptable to changing conditions

Fig. 4
figure 4

Comparison of methods. Based on the viewpoint of this article, the evolution of the system is a process in which the system iterates based on DNA rules and generates attractors. The time-series data of system evolution is the observation result of the system attractor in certain dimensions. The statistical method is to use neural networks to mine possible statistical associations directly based on time-series data of system evolution. Our method is to generate attractors through the iterative evolution of DNA rules or construct the equivalent attractor projection of the system through time-series data. In addition, for some systems, DNA rules can be obtained from the definition of the system

Causality can be obtained

Causality is composed of goals and DNA rules. The DNA-rules are different from the rules learned in a statistical way, there are logical with strict causal relationships. The DNA rules can be obtained from the definition (proved in Supplement by Theorems 2) for the definition is the precise constraint of things. Especially, the definition of games is composed of DNA rules, as is shown in Table 3. The goal is also independent of statistics, for example, getting more awards, following the lowest-energy principle, and gradient. It can be artificially defined, or it can be the response of the system itself. For example, in the jump and jump game below, there is an artificial goal to adapt the parameters to win more awards. In addition, in the process of predicting the time series, the target is the law of the passive response of the system to external interference, which is decided by the system itself (Table 3).

Table 3 DNA rules and the definition

Causal can also be mined by data reconstruction [7]. Equivalent projections of the attractors can be reconstructed using the time sequence of the system as shown in Fig. 5, and the causality can be further explored. The phase-space reconstruction method can be used to extend the one-dimensional data to the m-dimensional, and then the projection of the original system can be obtained. The phase-space reconstruction method is shown in Formula (3), where the x is the time-series data, and X is reconstructed m-dimensional data. In the formula, parameters τ and m can be obtained by the mutual information method and the Cao method [8]:

$$ X_{{m \times N_{m} }} = \left[ {\begin{array}{*{20}c} {X_{1} } \\ {X_{2} } \\ \ldots \\ {X_{m} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x_{1} } & {x_{2} } & \ldots & {x_{{N_{m} }} } \\ {x_{1 + \tau } } & {x_{2 + \tau } } & \ldots & {x_{{N_{m} + \tau }} } \\ {} & \ldots & \ldots & {} \\ {x_{1 + (M - 1)\tau } } & {x_{2 + (M - 1)\tau } } & \ldots & {x_{N} } \\ \end{array} } \right]. $$
(3)
Fig. 5
figure 5

Data-based causal mining and reconstruction. a Lorentz attractor, and b reconstructed attractor obtained by the phase-space reconstruction method. It is a projection of the original attractor on the observed data dimension

Experiment

The jump and jump game is easy to set up and everyone can reproduce the experiment. In the game, the player operates the doll to jump to the center of the next box by pressing the screen, and the distance the doll jumps is proportional to the duration of pressing time. The distance between the different boxes is different. The more boxes the doll jumps, the higher the score the player gets. A set of experimental devices was used to play the game, as shown in Fig. 6.

Fig. 6
figure 6

Experiment devices. The game runs on the phone and the real-time images are pushed to the PC. The PC analyses images and controls the stepper motor to press the screen with a capacitive pen. There is a changing interference in the device and we should constantly change the ratio of time and distance to adapt to it and even predict it

The experimental device is not a linear system because of the interference. For example, the interference in the motor is always changing and affected by many factors. Therefore, as shown in Fig. 7, the interference is changing and seems to be random. In addition, the conversion ratio r of distance and time also changes according to the display pixel on different phones. If we set a fixed ratio r to play the game, it will fail because of the changing interference. Therefore, the ratio here should also be a variable value rather than an invariable value. It need not only convert the distance to pressing time but also consider the interference. That means the system is not to simply find a fixed optimal ratio r, it should become adaptable and even predict the interference.

Fig. 7
figure 7

Interference in the experiment devices. The data t is generated by pressing the screen at the same time interval, the real interval between each pressing is changing. This data appears to be random and difficult to predict. In addition, it cannot be predicted by simply using a network. It interferes with the system and makes it complex

In the above game, the 4 DNA rules shown in Table 2 can be obtained. In addition, the iteration process is based on these rules. In the process of playing, the control error because of the interference is given by \( t_{e(k)} \), the control time is \( t_{c(k)} \) and the actual time to press the screen is \( t_{a(k)} \). The distance that the doll to the center of the next box is \( l_{d(k)} \), the distance the doll actually jumps is \( l_{a(k)} \) and the error is \( l_{e(k)} \). The s is the state after each jump: 1 means jump onto the box and 0 means not on the box. \( f_{r} \) means the iteration process based on four DNA rules. There are relationships as follows:

$$ t_{a(k)} = t_{c(k)} + t_{e(k)} . $$
(4)
$$ l_{a(k)} \, = \, l_{d(k)} { + }l_{e(k)} . $$
(5)
$$ r_{(k + 1)} = f_{r} (r_{(k)} ,l_{a(k)} ,l_{d(k)} ,s_{(k)} ) = \left\{ {\begin{array}{*{20}c} {\frac{{2l_{d(k)} - l_{a(k)} }}{{l_{d(k)} }}r_{(k)} ,s_{(k)} = 1} \\ {\left(1 + 0.2\frac{{l_{d(k)} - l_{a(k)} }}{{\left| {l_{d(k)} - l_{a(k)} } \right|}}\right)r_{(k)} ,s_{(k)} = 0} \\ \end{array} } \right.. $$
(6)

Therefore, the \( r_{(k)} \) is changing to adapt to the interference by iteration process based on DNA rules. In addition, the iteration process is shown by Algorithm 1.

The iteration based on DNA rules is actually reconstructing the nonlinear system which includes the changing interference, it generates the attractor of the whole system. Then it can adapt and predict the system within the Lyapunov time based on the attractor. Furthermore, the iterative process continually corrects r, so the errors will not accumulate and lead to failure. Therefore, it is possible to jump to the center of the next box in the following game and get a very high score as shown in Fig. 8. And the iterative process in the game is shown in Fig. 9.

Fig. 8
figure 8

Based on the method in this paper, the model can adapt to the interference and get a very high score after training. The above pictures are shot during the game. a Model failed for the first time just like the person plays at the beginning. b Model failed again, but it performed better than the first time. c Doll successfully jumped to the box although still inaccurate. d Doll jumped closer to the center of the next box. e, f Model plays well although there is a random interference in the game. The process from a to f shows that the model can master the game quickly. It learns, predicts, and adapts the interference as it is a part of the whole system. g, h Results of the model plays after adjusting the device. After adjusting, interference and ratio will also change. Fortunately, the model is highly adaptable to changes, it can quickly master the game again and get a very high score again even after changing the game device

Fig. 9
figure 9

System iterative process during the game. The iterative process is the process of learning and adapting the game and generated the attractor of the system. The generation of attractors records the orbits of the attractors and the system’s response to interferences. When the system is iterating, ratio r iterates basing on DNA rules and the goal. This process can be vividly explained by a tree although the attractor is not a decision tree. The tree simply shows the changes in the iteration on a plane, but the iterative process is actually the attractor which is converged in a certain range in the phase space. The value in the horizontal coordinates of the tree corresponds to the value of the ratio, and each fork represents an iteration. This continuously iterated tree can reach all positive numbers by iteration. At the same time, the rules will flexibly adjust the ratio when the conditions change. The bolded portion of the orange represents the process of iterations according to the interference in a certain game and the picture on the right corresponding to the progress of the game

We also compare our method with the statistical method by predicting the disturbance sequence in the above system. The interference sequence is shown in Fig. 7. The data is generated by pressing the screen at the same time interval, the real interval between each pressing is changing. The interference is a part of the whole experiment device, and the causality of the interference is unknown, so we tried to mining the causality from the data and reconstruct it.

In the experiment, a low-pass filter is firstly used to filter out the glitch while maintaining the characteristics of the data. Then the phase-space reconstruction method is taken to reconstruct the attractor and mine the causality. A simple BP (Back Propagation) Network learns the reconstructed data and causality and makes further predictions. The LSTM (Long–Short-Term Memory) Network is used to compare the effectiveness of the model for its network structure is suitable for sequence data. The results are shown in Fig. 10.

Fig. 10
figure 10

Predictions of the interference. Predictions of the interference. The prediction process is implemented by mining the causality and learning the causality by BP network. In the reconstruction, the optimal delay time of the sequence is 2, and the dimension m is 4. After reconstructing the time series, the main causality generating the interference is mined. Then the BP network can learn this causality and make predictions. a The simple BP network performs just like the complex LSTM method. b Compares these two prediction methods

The results show the trend of the LSTM predicted curve and the causality-based method curve is similar, they both fit the main trend of the time series. It implies this seemingly random interference has inherent rules and they can be mined and learned. Therefore, this sequence is not actually random. Compare with the LSTM model, the causality-based model requires only a simple BP network rather than a complex network. The model only needs to learn the mined causality. This makes ordinary networks more versatile. Further from the details, the reconstructed network predicts better performance than LSTM in some details. In addition, the mean square error of these two models is similar. The LSTM is 6.8 and the BPNN is about 6.9, which means the BPNN also works well. In addition, the results also show both models have shortcomings in detail fitting. On the one hand, the disturbance is very short, it is easily affected by other secondary factors. On the other hand, there is an error in the process of collecting data, because the accuracy of the timer is limited.

In the jump and jump game, an attractor is generated by iteration based on causality, and the adaptability of the attractor is utilized to realize the adaptation to the constantly changing interference in the system, so the system can get a very high score. The second experiment realized the prediction by mining and learning the causality. Learning the causality is equivalent to learning the system’s attractor, because attractor can be generated by causality. According to the experiments, we believe that the data appearing to be chaotic and random in time sequence is predictable from the perspective of the attractor except for the system of infinite divergence. The reason for the uncertainty in the time sequence data may because the evolutionary iteration of the system in phase space may seem to be random when observing from a single dimension, or the uncertainty may also be caused by the external interference which will drive the system to change orbits in the attractor. In the first case, the forecast can be achieved by constructing the attractor. In addition, for the second reason, the system can be predicted in the short term using our method basing on attractor (Theorems 1, 5 and 6 in Supplement).

Conclusion

Artificial intelligence is programming to simulate, achieve, and even eventually exceed the level of human intelligence, and prediction is one of the important issues of AI. In this paper, the method generates model by iteration based on causality and it is different from current methods based on statistical learning. It is adaptable and has the ability to predict, and it works well even in out-of-sample situations for it has complete attractor. These promise the model can work well prediction. In addition, it is also interpretable, not dependent on the amount of data [11]. In addition, this model also robust and flexible. In addition, the attractor shows the distribution of optimal solutions is sparse and fractal. These properties are proved in the supplement. This method may be a fruitful subject for further research. In addition, conjecture is proposed.

Conjecture 1

Definition is the best DNA rules for the iteration to generate the intelligent system.

The model based on the definition is modeling the system, but the model based on data is different, because the data may only reflect a part of the system. The definition is the exact meaning of an object. If one exceeds the definition, then it does not belong to that category anymore. In the experiment, if the rule is more than defined, the range of the attractor will be smaller, which will result in poor adaptation and if the rule is less than the definition, the range of the attractor obtained by the iteration will become larger, resulting in making errors. We compare the solution sets of definitions and other features that describe the same thing. Sd is the solution set corresponding to the definition, and Si is the solution set corresponding to other features. There must be the following relationship, given by Formula (7):

$$ s_{\text{i}} \in s_{\text{d}} . $$
(7)

The solution set of Si is a subset of the solution set of Sd. This also shows why the model based on DNA rules is more adaptable and can work even in out-of-sample conditions.

Just like the fruits that are grown without pesticides and fertilizers are more fragrant. Our model does not require complex human knowledge and statistics, so we call the model generated by an iterative process based on DNA rules corresponding to the definition as natural intelligence (NI).