In this research, we present SLYKLatent, a novel approach for enhancing gaze estimation by addressing appearance instability challenges in datasets due to aleatoric uncertainties, covariant shifts, and test domain generalization. SLYKLatent utilizes Self-Supervised Learning for initial training with facial expression datasets, followed by refinement with a patch-based tri-branch network and an inverse explained variance weighted training loss function. Our evaluation on benchmark datasets achieves an 8.7% improvement on Gaze360, rivals top MPI-IFaceGaze results, and leads on a subset of ETH-XGaze by 13%, surpassing existing methods by significant margins. Additionally, adaptability tests on RAF-DB and Affectnet show 86.4% and 60.9% accuracies, respectively. Ablation studies confirm the effectiveness of SLYKLatent's novel components. This approach has strong potential in human-robot interaction.
To date, the dynamic mechanisms by which the corticospinal tract (CST) and its alternative tract (i.e. the reticulospinal tract (RST)) interact and evolve after the CST has been damaged by stroke has not been fully explored. To gain insight into the mechanisms, we construct a computational model to reproduce several critical features of subscore distributions of the Fugl-Meyer assessment (FMA) for the upper extremity following stroke. Subscores of the FMA present clues about the working neural substrates affected by stroke, potentially distinguishing preferential uses of the CST and RST. A stochastic gradient descent method is employed to emulate biologically plausible phenomena, including activity- or use-dependent plasticity and the preferred use of more strongly connected neural circuits. The model replicates several segments of empirical evidence presented by imaging and neurophysiological studies. One of the main predictions is that substantial CST recovery is achievable unless the initial degree of residual corticospinal neurons following stroke falls below a certain level. Another prediction is that while the functional capabilities of the CST and RST increase in a harmonic way post-stroke, the degrees of functional capability those tracts reach are in a competitive relationship. We confirm that the neural system prioritizes optimizing a more strongly connected motor tract and uses the other tract in a supplementary manner to enhance overall motor capability. This model presents insights into efficient therapy designs.
This work presents an analytical electro-thermal model for SMD-based printed circuit board (PCB) power converters. Temperature-dependent component losses are derived from analytical models and a 3-D thermal resistance network is employed to characterize the temperatures across components and PCB paths. Furthermore, the work explores the mechanical and thermal interaction within the PCB paths, concurrently analyzing semiconductor switches and the power inductor in synchronous commutation cell configurations. The proposed model undergoes evaluation with two different PCB layouts of a synchronous boost converter, operating at 350 kHz with 50 W and 75 W. Model-generated temperatures are compared with experimental measurements using a thermal imaging camera and with Finite Elements Analysis (FEA) in Ansys Icepak. The acquired results validate the accuracy of the proposed model.
Over the last few years, a plethora of papers presenting machine learning-based approaches for intrusion detection has been published. However, the majority of those papers does not compare their results with a proper baseline of a signature-based intrusion detection system. Thus violating good machine learning practices. In order to evaluate the pros and cons of the machine learning-based approach, we replicated a research study which use a deep neural network model for intrusion detection. The results of our replicated research study expose several systematic problems with the used datasets and evaluation methods. In our experiments, a signature-based intrusion detection system with a minimal setup was able to outperform the tested model even under small traffic changes. While testing the replicated neural network on a new dataset recorded in the same environment with the same attacks using the same tools showed that the accuracy of the neural network dropped to 54%. Furthermore, the often claimed advantage of being able to detect zero-day attacks could not be seen in our experiments.
The metaverse-a 3D virtual universe is expected to significantly impact the education sector by making learning more accessible, personalized, and fun. The advancements in AI, blockchain, extended reality, big data, and cloud computing are the key enablers for the development of educational metaverses. The recent disruptions in AI, particularly, generative AI (GenAI) have transformed educational practices by generating human-like text, automating conversations, providing personalized learning experiences, and supporting students with disabilities. AI advancements together with immersive technologies hold immense potential to transform conventional education and learning by providing an interactive and immersive platform for seamless learning experiences. As GenAI advances, it is expected to generate more accurate and high-quality content, with future applications in the educational metaverse enhancing trustworthiness. This article contributes to background research on AI in education, a detailed study on the educational metaverse, a critical discussion on proactive measures to achieve trustworthy AI, and open research issues in the TAI in the educational metaverse context.
In this work, the problem of predicting a pedestrian’s intention to cross the road is addressed using visual data captured from a camera. The proposed ROS-based modular architecture consists of four modules starting with the Visual-Perception, Intention Prediction, and the Planning and Control Modules. The visual perception is further divided into three sub-modules. First, pedestrian detection is responsible for detecting the pedestrian and analyzing his state using motion and looking classifiers. Secondly, the detection of the lane that is responsible for analyzing the structured environment which helps in the road state classifiers. The third sub-module aims to extract some curvilinear localization states that are essential for the vehicle’s motion planning and control. The intention prediction module is integrated to capture the pedestrian’s intention to cross the road. In this module, a comparative study is conducted between three different data-driven sequential models. Each model is trained on the JAAD dataset and different extracted features form the visual perception module. It is observed that the proposed GRU model obtained 86% average f1-score, and can predict a pedestrian’s intention three seconds before crossing. In order to control the maneuver of the vehicle, the Proportional-Integral (PI) controller is implemented for longitudinal velocity control to brake the vehicle to avoid collision with the pedestrian, and a Linderoth controller is used to control the lateral motion of the vehicle. Finally, this work is verified on a 1:4 scaled real vehicle to ensure the applicability of implementing this work in real hardware.
Sir J. C. Bose was the first to demonstrate wireless transmission with his indigenous set up. His patent for galena detector and his reports for a few microwave components are well recognized. In this paper, a few of his experiments, somewhat less discussed but recognized by experts as the first, will be listed and described. These include his detector as first IR detector, first experiment on light tunneling, jute polarizer as first chiral metamaterial, hysteresis in I-V curve of coherer as first signature of memristor action and a polarizer having alternate layers of paper and tin foil as the first structures for both the photonic band gap and the superlattice. Relevance of his work to devices in current electronics, photonics and information technology are pointed out. Comments by experts in the areas are also included.
Anomaly detection in streaming data is a crucial task for many real-world applications, such as network security, fraud detection, and system monitoring. However, streaming data often exhibit concept drift, which means that the data distribution changes over time. This poses a significant challenge for anomaly detection algorithms, as they need to adapt to the evolving data to maintain high detection accuracy. Existing streaming anomaly detection algorithms lack a unified evaluation framework that can assess their performance and robustness under different types of concept drift and anomaly. In this paper, we conduct a systematic technical review of the state-of-the-art methods for anomaly detection in streaming data. We propose a new data generator, called SCAR (Streaming data generator with Customizable Anomalies and concept dRifts), that can synthesize streaming data based on synthetic and real-world datasets from different domains. Furthermore, we adapt four static anomaly detection models to the streaming setting using a generic reconstruction strategy as baselines, and then compare them systematically with 9 existing streaming anomaly detection algorithms on 76 synthesized datasets that exhibit various types of anomalies and concepts. The challenges and future research directions for anomaly detection in streaming data are also presented. All the codes and datasets are publicly available at https://github.com/yixiaoma666/ SCAR.
Regressive models in machine learning require regularization to balance the bias-variance tradeoff and attain realistic predictions in the real world. Two new regularization techniques, referenced as BiasWrappers, will be discussed in this paper: BiasWrapperC1 and BiasWrapperC2. BiasWrapperC1 uses a form of penalization to prevent models from consistently overshooting or undershooting. BiasWrapperC2 uses a modified layer of regression stacking to identify correlations of a regression model’s error. The techniques’ logics will be discussed through pseudocode in the context of machine learning regression. The regularization techniques are applied to machine learning models and compared with other regularization techniques through a series of carefully chosen datasets, and these metrics are used to hypothesize about the implications of these new techniques. All implementations are referenced with pseudocode in the paper, with external testing wrappers programmed in Python. An experimental study was conducted with standard regression datasets and showed the regularizations’ value propositions in multi-output data and outlier-based data.
Emotion monitoring in driving is important. Emotions can affect attention, memory, and decision-making and have a significant impact on our driving behaviors and safety. However, measuring and interpreting emotions is challenging: The same emotion can have different manifestations and different emotions can have similar manifestations. Contextualizing emotions can help with the interpretation and translation of emotional states. However, research on context and drivers' emotional states is limited. We investigate the effect of time, area, weather, surrounding conditions, and traffic conditions on drivers' emotions. Sixty-four images of various driving scenarios were generated using DALL•E 2, a generative AI model, and 238 participants were recruited through Prolific to respond how they would feel driving in such contexts. The results showed that rainy weather, tumultuous surrounding, and high traffic conditions were associated with an increase in negative emotions. On the other hand, driving in rural areas, in the morning time or with no traffic increased the intensity of positive emotions, while rainy weather conditions increase the intensity of negative emotions. The findings can guide the development of driver monitoring systems with respect to the effect of driver's emotional states.
The year 1948 witnessed the historic moment of the birth of classic information theory (CIT). Guided by CIT, modern communication techniques have approached the theoretic limitations, such as, entropy function H(U), channel capacity C = max p(x) I(X; Y) and rate-distortion function R(D) = min p(x|x):Ed(x,x)≤D I(X; X). Semantic communication paves a new direction for future communication techniques whereas the guided theory is missed. In this paper, we try to establish a systematic framework of semantic information theory (SIT). We investigate the behavior of semantic communication and find that synonym is the basic feature so we define the synonymous mapping between semantic information and syntactic information. Stemming from this core concept, synonymous mapping, we introduce the measures of semantic information, such as semantic entropy H s (Ũ), up/down semantic mutual information I s (X; Ỹ) (I s (X; Ỹ)), semantic capacity C s = max p(x) I s (X; Ỹ), and semantic rate-distortion function R s (D) = min p(x|x):Eds(x, x)≤D I s (X; X). Furthermore, we prove three coding theorems of SIT by using random coding and (jointly) typical decoding/encoding, that is, the semantic source coding theorem, semantic channel coding theorem, and semantic rate-distortion coding theorem. We find that the limits of SIT are extended by using synonymous mapping, that is, H s (Ũ) ≤ H(U), C s ≥ C and R s (D) ≤ R(D). All these works composite the basis of semantic information theory. In addition, we discuss the semantic information measures in the continuous case. Especially, for band-limited Gaussian channel, we obtain a new channel capacity formula, C s = B log S 4 1 + P N0B with the synonymous length S. In summary, the theoretic framework of SIT proposed in this paper is a natural extension of CIT and may reveal great performance potential for future communication.
_Goal:_ Vascular surgical procedures are challenging and require proficient suturing skills. To develop these skills, medical training simulators with objective feedback for formative assessment are gaining popularity. As hardware advancements offer more complex, unique sensors, determining effective task performance measures becomes imperative for efficient suturing training. _Methods:_ 97 subjects of varying clinical expertise completed four trials on a suturing skills measurement and feedback platform (SutureCoach). Instrument handling metrics were calculated from electromagnetic motion trackers affixed to the needle driver. _Results:_ The results of the study showed that all metrics significantly differentiated between novices (no medical experience) from both experts (attending surgeons/fellows) and intermediates (residents). Rotational motion metrics were more consistent in differentiating experts and intermediates over traditionally used tooltip motion metrics. _Conclusions:_ Our work emphasizes the importance of tool motion metrics for open suturing skills assessment and establishes groundwork to explore rotational motion for quantifying a critical facet of surgical performance. _Impact Statement_–This study aims to determine the effectiveness of metrics derived from needle driver rotational and tooltip motion tracking to determine differences in clinical expertise in open needle driving.
Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.
In developing digital twins for power electronics converters and other power system components, selecting an appropriate representation type and level of abstraction is fundamental. The choice of representation should balance fidelity, computational cost, and objectives of the representation. Digital twins are generally given a single, specific representation task; however, various functions can be delegated to the digital twin to support, leaving room for ambiguity in the design of the digital twin. Digital twins can be designed with multi-domain and multi-functional capabilities, allowing them to adapt to diverse system domains and perform a variety of representation tasks. This approach allows the digital twin to be as specialized as the physical asset it serves. This study introduces a framework enabling the development of multi-domain, multi-functional digital twins, adaptable for use in various representation tasks. The framework utilizes a collection of digital images for an accurate depiction of different asset elements, ensuring a detailed yet unified digital twin. The framework is designed to analyze the assigned representation task and select the most suitable digital image for execution. Details on the development of the framework are provided and experimental results validate the effectiveness of the proposed framework.
Detailed design requirements and foundational assumptions for forecasting digital twins in the context of power systems are explored and used in real-time forecasting in this study. Experimental validation of the forecasting methodology is demonstrated via an electro-thermal digital twin of power distribution cables for onboard power systems. The digital twin has the capability to forecast the thermal profile of cables by utilizing sensor measurements from the physical twin. When the predicted temperature reaches specific thresholds, the digital twin informs a decision maker to proactively adjust the power flow within the system to prepare for and avoid upcoming thermal constraints in the cable. This adjustment ensures that the physical cable does not reach specific thermal constraints, thereby enhancing system reliability. This proactive management will be essential to ensuring mission-critical power demand and avoiding load shedding. The concept has been experimentally verified using a three-bus configuration. The developed digital twin is computationally efficient, forecasting only when necessary, and offering an adjustable forecasting time-frame to accommodate a variety of operational scenarios.
Previous methods have demonstrated remarkable performance in single image super-resolution (SISR) tasks with known and fixed degradation (e.g., bicubic downsampling). However, when the actual degradation deviates from these assumptions, these methods may experience significant declines in performance. In this paper, we propose a Dual Branch Degradation Extractor Network to address the blind SR problem. While some BlindSR methods assume noisefree degradation and others do not explicitly consider the presence of noise in the degradation model, our approach predicts two unsupervised degradation embeddings that represent blurry and noisy information, respectively. The SR network can then be adapted to blur embedding and noise embedding in distinct ways. Furthermore, we treat the degradation extractor as a regularizer to capitalize on the differences between SR and HR images. Extensive experiments on several benchmarks demonstrate that our method achieves SOTA performance in the blind SR problem.
Contrary to the modules in conventional modular multilevel and cascaded bridge converters with only serial and bypass operation, emerging topologies bring additional parallel inter-module connectivity, introducing sensorless voltage balancing, load current sharing, and enhanced efficiency. However, matching voltages and characteristics is crucial for the modules to allow for a parallel mode between them, but it is not feasible for mixed-type or heterogeneous battery systems. This paper introduces a reconfigurable battery system designed to solve the challenges of integrating batteries with varying characteristics. Using compact coupled inductors and a novel modulation strategy, the system achieves intermediate states between parallel and series modes when dealing with heterogeneous modules. The coupled inductors, with minimal magnetic material and a small footprint, have a negligible common-mode inductance for the high load current and limit circulating currents through high differential-mode inductance. Furthermore, the proposed modulation strategy introduces free DC/DC conversion functionality and enables an efficient bidirectional energy transfer, fully capable of controlling the power exchange between modules widely independently from the output control. This inter-module DC/DC functionality enables effective charge or load balancing among batteries of varying voltages, types, and age. Importantly, the same transistors perform the DC/DC functionality and the output control so that the topology does not need more silicon. Extensive simulations and experiments demonstrate the performance of the system. The proposed system can notably reduce the inductor’s core size by more than 80% with the circulating current as high as one-fifth of the load current. The solution also reduces the cost of the inductor by more than four times. Moreover, the findings demonstrate > 15 % improvement in conduction and > 50 % in switching losses.
A two-stage knowledge transfer framework for distilling efficient dehazing networks is proposed in this paper. Recently, lightweight dehazing studies based on knowledge distillation have shown great promise and potential. However, existing approaches have only focused on exploiting knowledge extracted from clean images (hard knowledge) while neglecting the concise knowledge encoded from hazy images (soft knowledge). Additionally, recent methods have solely emphasized process-oriented learning rather than response-oriented learning. Motivated by these observations, the proposed framework is targeted toward aptly exploiting soft knowledge and response-oriented learning to produce improved dehazing models. A general encoder-decoder dehazing structure is utilized as the teacher network as well as a basis for constructing the student model with drastic complexity reduction using a channel multiplier. A transmissionaware loss is adopted that leverages the transmission information to enhance the network's generalization ability across different haze densities. The derived network, called Soft knowledgebased Distilled Dehazing Network (SDDN), achieves a significant reduction in complexity while maintaining satisfactory performance or even showing better generalization capability in certain cases. Experiments on various benchmark datasets have demonstrated that SDDN can be compared competitively with prevailing dehazing approaches. Moreover, SDDN shows a promising applicability to intelligent driving systems. When combined with YOLOv4, SDDN can improve the detection performance under hazy weather by 9.1% with only a negligible increase in the number of parameters (0.87%). The code of this work is publicly available at https://github.com/tranleanh/sddn.