PulseEdit: Editing Physiological Signal in Facial Videos for Privacy Protection

—Recent studies have shown that physiological signals can be remotely captured from human faces using a portable color camera under ambient light. This technology, namely remote photoplethysmography (rPPG), can be used to collect users’ physiological status who are sitting in front of a camera, which may raise physiological privacy issues. To avoid the privacy abuse of the rPPG technology, this paper develops PulseEdit, a novel and efﬁcient algorithm that can edit the physiological signals in facial videos without affecting visual appearance to protect the user’s physiological signal from disclosure. PulseEdit can either remove the trace of the physiological signal in a video or transform the video to contain a target physiological signal chosen by a user. Experimental results show that PulseEdit can effectively edit physiological signals in facial videos and prevent heart rate measurement based on rPPG. It is possible to utilize PulseEdit in adversarial scenarios against some rPPG-based visual security algorithms. We present analyses on the performance of PulseEdit against rPPG-based liveness detection and rPPG-based deepfake detection, and demonstrate its ability to circumvent these visual security algorithms.


I. INTRODUCTION
V IDEO-CAPTURING devices are ubiquitous in our daily life. These devices greatly facilitate us to share our life with friends and communicate online with others. Yet have we realized whenever a person appears in front of a camera, not only can people recognize his/her identity based on the facial appearance, but also monitor some aspects of his/her physiological status such as cardiac activity?
Recent research has shown that contact-free measurement of human physiological signals from facial videos is feasible through computer vision algorithms [1]- [4]. For instance, remote photoplethysmography (rPPG) technology has attracted a growing amount of R&D interests, which is capable of capturing the subtle color changes of the skin caused by heartbeats in facial videos under ambient light. We can further infer heart rate (HR) [5]- [10], respiration rate (RR) [11], [12], and heart rate variability (HRV) [13] from extracted rPPG signals. This promising technology can be leveraged to build systems for remote monitoring stress and fatigue during computer tasks [14] and sports training [15].
Recalling the question we have raised at the very beginning, we recognize that this emerging technology may cause M. Chen   PulseEdit can edit rPPG signal in a facial video to conceal the person's true physiological status, without visual distortion of his/her appearance. We impose negligible additive perturbation onto the facial region in the video, and successfully modify the HR extracted by the rPPG algorithm. In this example, HR is edited from 66 to 120 beats per minute (bpm) to avoid the disclosure of the user's true heart rate in the video.
concerns about physiological privacy. Video-capturing devices not only record a person's appearance but also his/her cardiac activity and physiological status simultaneously. This kind of physiological information intrinsically present in facial videos may be abused to collect and analyze a person's physiological features secretly with ulterior motives. For example, your opponents can read your physiological status and analyze your psychological activities, to gain an advantage in missioncritical negotiation conferences. In daily life, one person's certain health conditions may be revealed unawares by a party without his/her explicit consent, leading to potential privacy concerns. To address the above physiological privacy issue, it is important to investigate how to effectively protect the physiological signals from disclosure in facial videos. To this end, we propose PulseEdit illustrated in Fig. 1, a novel method that edits rPPG signals in facial videos by superimposing specifically designed perturbation of small amplitude onto the input videos. Our method outputs a video that is visually the same but has its rPPG signal either removed completely or transformed to a target HR based on the user's choice. Processed by PulseEdit, the users' rPPG signals are protected from disclosure in the facial videos.
To make PulseEdit effective in practical use, we consider the following requirements when designing and evaluating the algorithm: • Invisibility: the editing on the face should be negligible without obvious appearance distortion.
• Universality: the protection should be valid on the face globally and locally. The processed video should no longer contain the user's true rPPG signal and the edited rPPG signal can be detected from the whole face, as well as local skin regions.
• Generality: the protection should be able to conceal a person's true rPPG signal against various rPPG algorithms in the literature. In other words, the edited rPPG signal can be measured by various rPPG algorithms.
• Resistance: an advanced requirement is that the editing on the face can be resistant to forensic analysis.
In addition to privacy protection, PulseEdit can impact other applications where rPPG is employed. More specifically, rPPG signal has been demonstrated as a useful and discriminative feature in various visual security tasks, such as liveness detection [16]- [18] and deepfake detection [19], because real/live videos and fake/synthetic videos have different representations in rPPG signals extracted from the facial regions. Empowered by PulseEdit, we can edit the rPPG signals in facial videos and circumvent the above rPPG-based visual security algorithms. It is not difficult to see that PulseEdit is a potential threat to invalidate these algorithms, providing a direction to revise them and improve the confidence of their output decisions.
Our main contributions are summarized as follows: • We develop PulseEdit, a novel algorithm that can edit rPPG signals in a facial video to conceal a person's true cardiac activity and physiological status, without introducing noticeable visual distortion in the video.
• We demonstrate that PulseEdit can provide effective privacy protection under various rPPG extraction algorithms in the literature and robustly edit rPPG signals in global and local facial regions. We further investigate the forensic detectability of PulseEdit against forensic steganalysis.
• We analyze the effectiveness of PulseEdit in circumventing rPPG-based liveness detection and rPPG-based deepfake detection. We show that PulseEdit is promising in circumventing these rPPG-based algorithms, which suggests that more research efforts are needed to improve these rPPG-based visual security algorithms from this adversarial perspective.
In the rest of the paper, we first introduce the prior work related to rPPG technology and its application in visual security tasks in Section II. Section III describes the proposed PulseEdit to edit rPPG signals in facial videos. We carry out comprehensive performance analysis on the PulseEdit algorithm for removing/modifying rPPG signals in facial videos in Section IV, and explore its feasibility as a potential adversary against rPPG-based liveness detection and deepfake detection algorithms in Section V. Finally, Section VI holds a related discussion and Section VII concludes the paper.

A. rPPG Technology
Monitoring cardiac activity is essential for understanding a person's health status and is actively used in clinical practices and home care. Conventional methods require contact-based sensors attached to the human skin, such as electrocardiogram leads, a pulse oximeter, or a fitness tracker.
Recently, rPPG enables contact-free HR measurement using color cameras. The principle of rPPG is that the blood volume changes under the skin influence the intensity and color of the reflected light from the skin, whose pattern is consistent with heartbeat cycles. Although such subtle momentary changes in the reflected light from the facial skin are not detectable by the human eyes, they can be captured by a color camera [1]. The method of Eulerian video magnification [20] can amplify and visualize the subtle color changes in a facial video caused by the blood flow. Independent component analysis (ICA) [13], chrominance mapping (CHROM) [2], and planeorthogonal-to-skin (POS) [4] were proposed to extract robust rPPG features from three color channels. Li et al. [5] applied adaptive filtering to handle environmental illumination and voluntary motion issues in remote HR measurement. Tulyakov et al. [6] proposed self-adaptive matrix completion to denoise rPPG features and offer robust HR estimation. The challenging fitness scenario [21], [22] has also been studied to improve the robustness of the rPPG technology. End-to-end models [7], [9] employing deep learning were also introduced to estimate HR from videos.

B. Biometric Privacy Protection
Biometric privacy protection [23], [24] aims to conceal a person's privacy in biometric data and prevent possible thefts and misuses of this information. Traditional biometric privacy protection algorithms were proposed to de-identify a person's identity from these biometric features, including face [25], [26], iris [27], and fingerprint [28]. Deep learning has been introduced to protect privacy in multimodal biometrics [29].
Recently, as many methods have been proposed in the recent decade to extract physiological signals from facial videos, concerns are raised concurrently on the privacy issues of physiological information in videos. This information may be misused to collect and analyze a person's physiological features with ulterior motives. Chen et al. [30] applied motion elimination in facial videos to remove subtle motion induced by pulse on the subjects' faces to avoid the disclosure of the rPPG signal. The experimental results show that the rPPG signals are successfully removed without appearance distortion. Nevertheless, the work only studied the steady case in the research. When the subject performs voluntary motion (e.g., talking, head translation, and rotation) in video recording, it is hard for Chen's method to remove pulse-induced subtle motion but maintain the subject's voluntary motion.
In this paper, we propose to edit the rPPG signals that are intrinsically presented in facial videos by perturbing the skin pixels on the face and conduct experiments on motion cases as well as steady cases. Compared with the prior art, not only is our work capable of removing the rPPG signal in a facial video, but also transforming it to a target one if desired by the user.
C. rPPG Feature in Visual Security Tasks rPPG signal has been employed as a discriminative feature to tackle several visual security tasks involving face videos, such as liveness detection against spoofing and deepfake detection. Liveness detection is crucial to protect face recognition systems from spoofing attacks, including printing a face on paper, replaying a facial video on a digital device, wearing a 3D face mask, and other approaches by adversaries. Liu et al. [16], [17] used the cross-correlation of rPPG features in multiple facial regions to classify live faces vs. spoofed faces. Hernandez et al. [18] proposed to analyze the signal quality of rPPG extracted from faces to discriminate live faces and spoofed faces.
"Deepfake" refers to the technologies for a computer to transform a person's face to another's in images and videos. Since deepfake videos circulated in social media have brought serious concerns such as through celebrity pornographic videos, fake news, hoaxes, and financial fraud, which largely impairs the integrity of social media, deepfake detection has attracted a lot of attention in the recent computer vision research. In terms of the roles of rPPG for deepfake detection, FakeCatcher [19] explored the discriminative features of rPPG signals extracted from facial videos and utilized them for deepfake detection.

III. PROPOSED METHOD
PulseEdit has three main steps as shown in Fig. 2. We first detect the facial region in the video and extract skin intensity signals from multiple subregions on the face. We then obtain the perturbation signal via an optimization problem that transforms the rPPG signal in the video to a target signal. Finally, we manipulate the skin pixels in the video according to the perturbation signal, so that the PulseEdit video successfully removes the rPPG signal, or if desired, transforms the rPPG signal to a target rPPG signal. We refer to the two modes as "removal" and "modification", respectively, for short. In the removal mode, the target signal can be white Gaussian noise; and in the modification mode, the target signal can be a simulated sinusoid with the frequency of a target HR or the rPPG signal extracted from a reference video of the user's choice.

A. rPPG Extraction
Similar to the prior art in rPPG research, we first track the subject's face in the video to extract rPPG signal. We apply the facial landmark detector by Dlib [31] to locate and track 68 facial landmarks, from which we define the facial region of interest (ROI) shown with the green dots in the video frame in Fig. 2. To facilitate rPPG extraction from multiple subregions [6], the ROI is normalized to a rectangle using piecewise linear geometric transformation, and skin color pixels are masked by a Gaussian skin color model in chrominance space [32]: where x = [cb, cr] T , and m and Σ are the mean and covariance matrix of the Gaussian skin color model. Within the masked rectangle facial ROI, we use a rectangle of a quarter size to uniformly select M subregions (subregions can have overlap with their neighbors). We compute the spatial average of the skin pixels in each subregion to form the skin intensity signal R ∈ R M ×3×N , for M subregions, 3 color channels, and N frames in the video. In the subsequent discussions, we refer to the subscripts i and c as subregion i and color channel c, respectively. For example, R i,c denotes the skin intensity signal in subregion i and color channel c.

B. rPPG Editing
In this module, our goal is to find a suitable perturbation on the skin intensity signals to change the rPPG in videos to the target signal given by users. We first detrend the skin intensity signal R i,c , ∀i, c, to eliminate the illumination interference in the environment. In the detrending process, we use l 1 trend filtering [33] to obtain the signal trend and subtract the trend from the skin intensity signal. The whole process can be described as where S ∈ R M ×3×N denotes the corresponding detrended signal, the subscripts i and c denotes the subregion and the color channel, and D ∈ R (N −2)×N is the second-order difference matrix We denote δ ∈ R 3×N as the additive perturbation imposed onto the detrended skin intensity signal S, which gives rise to the edited signalS where δ c denotes the perturbation in the color channel c.
Next, we set up the target rPPG signal T ∈ R 3×N . To ensure the output video contains the target rPPG signal T , we maximize the similarity between the edited signalsS and the target signal T using the Pearson correlation coefficient: For an edited facial video, we require that the person in the video has negligible perceptual distortion. Thus, we regularize the perturbation signal δ with L 2 loss to control the perturbation budget in the facial video:  2. Pipeline of PulseEdit system. We first extract skin intensity signals from multiple facial subregions in the video. Then, we compute the perturbation signal that can change the rPPG signals in the video to the target rPPG signal. Finally, we edit the skin pixels in the video, and the rPPG signal extracted from the video processed by PulseEdit is successfully transformed to the target signal.
Combining the above two terms, we obtain the perturbation signal δ by solving the optimization problem: We can use a gradient-based solver (e.g., the Adam solver [34]) to solve the optimization problem in (6).

C. Skin Pixel Adjustment
The goal of this module is to map the perturbation signal δ ∈ R 3×N in time series to the spatial-temporal perturbation frames ∆ ∈ R h×w×3×N , where h and w refers to the height and width of the frames in pixel count. We denote δ c (n) as the perturbation of the color channel c in the n-th frame. One simple and intuitive approach to edit the pixels on the face is to directly add δ(n) to every skin pixel on the facial region in the n-th frame of the input video. Due to the integer quantization of pixel values in video frames, the decimal part of δ(n) needs special consideration in order to ensure the pixel values collectively are changed by the expected amount.
We adopt randomized dithering to skin pixels to achieve decimal perturbation in a statistical sense. Specifically, for the color channel c in the n-th frame, we adjust the skin pixels in an amount of either δ c (n) with probability p or δ c (n) with probability 1 − p, where p should be chosen so that Equation (7) yields p = δ c (n) −δ c (n). Algorithm 1 presents the detailed procedure of skin pixel adjustment to generate the final PulseEdit video.

IV. EXPERIMENTAL RESULTS
In this section, we present experimental results on the PURE dataset [35] to demonstrate the effectiveness and robustness of PulseEdit in editing rPPG signals in facial videos. To further validate the forensic undetectability of PulseEdit when being used as a potential attack, we test the PulseEdit videos against digital forensic analysis. Lastly, we compare PulseEdit with the prior art of rPPG removal method [30] and study the influence of different subject motion settings in video recordings on the performance of rPPG removal.
Original rPPG removal 30 We set the target HR = 120 bpm in rPPG modification task. The x-axis and y-axis denote the time and heart rate (30 bpm to 180 bpm), respectively. The red lines in the spectrograms of original video indicate the reference HR from pulse oximeter and the black dash lines in spectrograms of rPPG modified videos indicate the target HR = 120 bpm.
channels. To simulate the noise condition of rPPG signals, we added white Gaussian noise with −10 dB, −0 dB, and −10 dB in red, green, and blue channels, respectively, since the green channel generally contains the strongest level of PPG signal among all three channels [1]. We used the whole face region in rPPG analysis to estimate HR from facial videos.
We study the influence of different λ = {0, 0.1, 0.5, 1, 2, 5} on the performance of PulseEdit. To investigate the robustness We set the target HR = 120 bpm in rPPG modification mode. The x-axis and y-axis denote the time and heart rate (30 bpm to 180 bpm), respectively. The red lines in the spectrograms of the original video indicate the reference HR from pulse oximeter and the black dash lines in the spectrograms of the rPPG modified videos indicate the target HR = 120 bpm.
In the paper, we set M = 6 × 6 = 36 and use Adam [34] to solve (6) with the learning rate 0.1 and the number of iterations 200.

A. Performance on PURE dataset
The PURE dataset [35] contains 60 facial video recordings of 640 × 480 pixel resolution and 30 frames per second (fps) in well-lit rooms from 10 subjects. Each subject was recorded in 6 different setups: steady, talking, slow translation, fast translation, small rotation, and medium rotation. The videos were stored without lossy compression. To validate the effectiveness of PulseEdit in editing rPPG signals in facial videos, we analyzed the PulseEdit outputs of the PURE videos with three highly-cited rPPG algorithms: ICA [13], CHROM [2], and POS [4]. We extracted rPPG signal from the whole facial region in this part of the experiments to estimate HR, and evaluated the performance using mean absolute error (MAE). For the rPPG removal mode, we computed the error between the estimated HR from the video and the reference HR from pulse oximeter provided by the dataset. For the rPPG modification mode, we computed the error between the estimated HR from the video and the target HR. We applied PulseEdit on the PURE videos for both the removal and modification modes. In the removal mode, we generated white Gaussian noise as the target rPPG signal T to remove the intrinsic rPPG signal in the original video. In the modification mode, we aimed at changing the rPPG signal to HR = 120 bpm as an example. We generated a sinusoid of frequency 120 bpm as the target rPPG signal T for all the color channels. To simulate the noise condition of rPPG signals, we added white Gaussian noise with −10 dB, −0 dB, and −10 dB in red, green, and blue channels, respectively, since the green channel generally contains the strongest level of PPG signal among all three channels [1]. We used the whole face region in rPPG analysis to estimate HR from facial videos.
We study the effect of different λ = {0, 0.1, 0.5, 1, 2, 5} on the performance of PulseEdit, which governs the perturbation budget in the facial video. To investigate the robustness of PulseEdit against video lossy compression, we compressed the edited frames by MPEG-4 format at the average bitrate of around 500 kbps. Fig. 3 shows the qualitative comparison of the video frames and the corresponding rPPG spectrograms with different λ. Fig. 4(a) and (b) present the performance of HR estimation before and after PulseEdit in removal and modification modes, respectively.
In the removal mode, we aim to increase the error of HR estimation with respect to reference HR, and Fig. 4(a) shows that the error increases as λ decreases. When λ is less than 0.5, the rPPG-removed videos have a very large estimation error (i.e., > 10 bpm), indicating the successful removal of the intrinsic rPPG signal by PulseEdit. In the modification mode, our goal is to reduce the error of HR estimation with respect to target HR, and Fig. 4(b) shows that the error is reduced as λ decreases. When λ is less than 0.5, the rPPGmodified videos have HR estimations very close to the target HR, with an error no more than 1 bpm for uncompressed videos and 10 bpm for MPEG-4 videos. This suggests that PulseEdit can effectively transform the rPPG signal in a video to a target HR. From Fig. 3, we observe that when λ increases from 0 to 5, the original rPPG signals gradually appear in the spectrograms of the edited videos. This indicates that we need to spend enough editing expense (smaller λ) in the video to successfully conceal the original rPPG signal.
Since lossy compression may attenuate the rPPG signal on the face, it is expected that the HR error is larger in MPEG-4 videos than in uncompressed videos. Specifically, in the rPPG modification mode, the HR error with respect to target HR is larger in the MPEG-4 video than in the uncompressed one. Nevertheless, the modified rPPG signal of target HR can still be detected by the rPPG methods within an acceptable error range, when we choose λ < 0.5. In comparison, lossy compression has less impact on the rPPG removal mode. On the whole, these results indicate that although lossy compression can weaken the manipulations introduced by PulseEdit, the privacy protection of the intrinsic rPPG signal remains effective when choosing a proper λ.
An important observation is that the three rPPG methods have similar HR estimation performance on the PulseEdit videos, indicating that PulseEdit is effective to various rPPG algorithms. This satisfies the "generality" requirement. Fig. 4(c) shows the objective image quality assessment for the PulseEdit videos within the facial ROI with a size of 300 × 300. Since λ governs the editing strength in the video, frame-level PSNR increases when λ increases. By vision examination, we can hardly notice the distortion on the person's appearance shown in Fig. 3.
Running time. Overall, PulseEdit runs efficiently. On average, the step of rPPG extraction runs at around 10 fps, the step of rPPG editing reaches 170 fps, and the step of skin pixel adjustment runs at around 100 fps. These running times were measured using a single-core Python implementation on a PC  with an Intel Core i5-4440 processor. As the step of rPPG extraction is highly dependent on the speed of facial landmark detection, the running time can be further reduced if facial landmark detection is optimized.

B. rPPG Analysis on Multiple Facial Subregions
To examine the universality of PulseEdit, we analyze the presence of rPPG signals in three facial subregions: forehead, and left and right cheek, shown in Fig. 6. The regions are detected automatically according to the facial landmarks. Fig. 5 presents the performance of HR estimation from three facial subregions using three rPPG algorithms. Since a larger size of ROI generally gives a better average quality of rPPG extraction [36], we expect a reduced accuracy of HR estima-tion from facial subregions alone, compared with using the whole face region.
From Fig. 5, we observe that HR error from the three facial subregions has a similar trend as that from the whole face region under different λ values. For the rPPG-removed videos, the error is much larger than the original videos, when λ is less than 0.5. This suggests that the intrinsic rPPG signals are completely erased in all three facial subregions. For the rPPG-modified videos, the HR error with respect to the target HR is in an acceptable range, when λ is less than 0.5. We can see that the rPPG signals in all three facial subregions are successfully transformed to the target HR. In summary, these results indicate that PulseEdit can effectively edit the rPPG signals not only in the global facial region but also in local facial subregions, which satisfies the "universality" requirement.

C. PulseEdit against Forensic Analysis
From the previous performance analysis on PulseEdit, we can see that PulseEdit is effective in editing the intrinsic rPPG signals in facial videos for privacy protection. As motivated in Section I, it is possible to utilize PulseEdit in adversarial scenarios by forgers. In this subsection, we examine the effect of forensic analysis tools to help us understand the strengths and limitations of PulseEdit.
PulseEdit perturbs the skin pixels by a small amount in the video frames to edit rPPG signals, which is similar to how steganography manipulates the images. Based on this point of view, we examine the forensic detectability of PulseEdit against two representative steganalysis methods: spatio-color  rich model (SCRM) [37] with ensemble training [39], and WISERNet [38] based on deep learning. Since PulseEdit only edits the facial regions, we cropped facial ROI with a size of 300 × 300. We set the original video frames as negative and the PulseEdit video frames as positive, and used 5-fold crossvalidation to evaluate the performance. For deep models, we changed the size of feature maps in the intermediate layers accordingly to cater to the input size of 300 × 300. We observe that the steganalysis models are most effective on uncompressed video frames as their detection performance has an area under curve (AUC) of 0.99+ for every λ value. They can almost perfectly differentiate the original video frames and the edited video frames by PulseEdit. Without incorporating additional constraints, the randomized pixel adjustment in Section III-C perturbs the skin pixels independently in the frame, introducing artificial changes among local neighboring pixels that are not presented in the direct output of video cameras. This kind of unconstrained distortion can be easily extracted by various image forensic models and discriminative to natural images and edited images [40]- [42]. Fig. 7 presents the steganalysis results on the lossily compressed videos. Compared with the uncompressed videos, the steganalysis result of the MPEG-4 videos degrades in a noticeable amount. For the two steganalysis models, the deep model has a better ability to detect the manipulation trace in the lossily compressed videos than the classic model. We also find that the steganalysis performance on the lossily compressed videos decreases significantly in both forensic methods as λ increases. This suggests that lossy compression can alleviate the detectability of the manipulation traces in videos introduced by PulseEdit.
In the current form, PulseEdit focuses on altering the rPPG information for privacy protection and has not explicitly concealed the traces of manipulation. As such, the presence of perturbation can be detected from the uncompressed frames by such forensic tools as steganalysis. Because of the limitation of such forensic analysis for lossy compressed frames and the small and random perturbation of PulseEdit by design, a lossy compression on PulseEdit videos can evade forensic steganalysis and remain effective in concealing/modifying the intrinsic rPPG information. It is possible to further include various forensic undetectability into the algorithm, to gain insights on the ability of PulseEdit as an antiforensic tool and the competing direction of detecting the manipulations made by PulseEdit.

D. Performance Summary of PulseEdit
Taking into consideration HR estimation error in PulseEdit videos, perceptual distortion, and resistance against forensics, we choose λ = 0.5 in PulseEdit and use it for the following experiments. We summarize the experimental results of PulseEdit with λ = 0.5 in Table I. The first three macrorows show MAE of HR estimation (unit: bpm), using different rPPG algorithms. Note that, we compute MAE between the estimated HR and the reference HR from pulse oximeter in rPPG removal mode, and compute MAE between the estimated HR and the target HR= 120 bpm in rPPG modification mode. The next row shows frame-level perceptual distortion analysis within the facial ROI between original videos and edited videos. The last two rows present the forensic analysis on PulseEdit. Table I shows that the error of HR estimation with respect to reference HR from the facial video increases after PulseEdit in rPPG removal mode; the error of HR estimation with respect to target HR decreases significantly after PulseEdit in rPPG modification mode. This indicates that the proposed PulseEdit can effectively remove/modify rPPG information both in the whole face sense and in the local subregion sense, tested by various rPPG methods. High PSNR index suggests that PulseEdit hardly introduces perceptual distortion on the subject's appearance. Comparing the HR estimation error between the uncompressed and MPEG-4 videos, we can see that lossy compression can weaken the manipulation applied in the facial videos, but PulseEdit can still edit the rPPG signals to some extent. From the angle of antiforensics, the AUC index reduces more than 0.4 in the SCRM and more than 0.2 in the WISERNet. This indicates that lossy compression can greatly help PulseEdit videos defend forensic analysis.

E. Comparison with Prior Art
We compare the proposed PulseEdit in rPPG removal mode with the prior art Chen's method [30]. We report the estimated HR error from the facial videos using the three rPPG methods, ICA, CHROM, and POS. The performance is evaluated on the uncompressed videos. Fig. 8 presents bar plots of performance comparison between the proposed PulseEdit and Chen's method. We study the influence of 6 motion settings on the rPPG-removing methods: steady, talking, slow translation, fast translation, small rotation, and medium rotation.
From Table II, we can see that the three rPPG methods can accurately estimate HR from the original videos. Given the fact that we can extract rPPG signals accurately from the original videos, Chen's methods and PulseEdit, in average performance, can both amplify the HR estimation error and our method can remove the intrinsic rPPG signal more completely, leading to larger amplification of HR error. The PSNR index indicates that the proposed PulseEdit has less distortion than Chen's method on the video frames. When we analyze different motion settings in the video recordings from Fig. 8, we can observe that Chen's method has similar performance to our method in the steady case but does not perform well in the talking, head translation, and head rotation cases. This suggests that Chen's method is not effective when dealing with head motions, and voluntary motions can easily overwhelm the subtle HR-induced motion in the video. In comparison, PulseEdit has little variation in the performance among all 6 different motion settings, indicating that our proposed method is effective in a variety of motion settings.

V. ANALYSIS OF ADVERSARIAL SCENARIOS
Since PulseEdit can edit rPPG signals in videos, we expect that PulseEdit, as an adversarial operation, can circumvent rPPG-based liveness detection [16], [18] and rPPG-based deepfake detection [19]. Thus, we conducted experiments on the HKBUMARsV1+ dataset [17] for liveness detection and the Celeb-DFv1 dataset [43] for deepfake detection to evaluate the effectiveness of PulseEdit on above two aspects, respectively.

A. Analysis against rPPG-based Liveness Detection
Liveness detection aims at detecting whether a person seen by a camera is in his/her true live appearance or wearing a camouflaging mask with different facial appearances, a profile photo, or a video replay, to prevent face spoofing in identity authentication. Since live faces and many spoofed faces often have different characteristics in rPPG features We test two rPPG-based liveness detection methods, namely, Hernandez's method [18] and Liu's method [16], as a proof-of-concept, to analyze the performance of PulseEdit on circumventing the rPPG-based methods.
We conducted experiments on the HKBUMARsV1+ dataset [17], which consists of video recordings of 12 subjects in flesh and wearing 3D face masks of different appearances. We set live facial videos as negative and 3D mask videos as positive. The classifier settings are the same as stated in [16], [18]. PulseEdit was applied to the 3D mask videos, with the target rPPG signals generated using the same procedure as in the rPPG modification mode in Section IV-A. We used subject-independent 5-fold cross-validation to evaluate the performance of the detector on the videos before and after PulseEdit.
We report the equal error rate (EER) and AUC in Table III to show the impact of PulseEdit on the rPPG-based liveness detection algorithms. EER refers to the point where false positive rate and false negative rate are equal. AUC refers to the area under the receiver operating characteristic (ROC) curve. Smaller EER and larger AUC indicate better detection ability. We can see that PulseEdit increases the EER from 0.29 to 0.40 and decreases the AUC from 0.77 to 0.31 for Hernandez's method [18], and increases the EER from 0.10 to 0.26 and decreases the AUC from 0.94 to 0.73 for Liu's method [16]. On average, the 5-fold cross-validation shows that 96% of correctly classified 3D mask videos in Hernandez's method are classified as live videos after we apply PulseEdit to these videos and 64% in Liu's method. These results suggest that the current form of PulseEdit can already circumvent the rPPG-based liveness detection to some extent and additional optimization may enhance such evasion by incorporating information from the existing research of liveness detection.

B. Analysis against rPPG-based Deepfake Detection
The fast development of deep learning enabled computers to transform a person's face to another's in images and videos. These "deepfake" videos can spread misinformation and fake We conducted experiments on the Celeb-DFv1 dataset [43], which consists of 370 real videos and 733 deepfake videos in the training set, and 38 real videos and 62 deepfake videos in the test set. We considered real videos as negative and fake videos as positive, and trained the FakeCatcher CNN model in the training set. The CNN architecture is the same as stated in [19]. As shown in Fig. 9(c) and Table III, FakeCatcher achieves an EER of 0.29 and an AUC of 0.76 in the test set.
We applied PulseEdit on the deepfake videos in the test set, with the rPPG signals extracted from the corresponding real videos as the target rPPG signals. In other words, we tried to restore the original rPPG signal in the deepfake videos. From the classification performance of the FakeCatcher on the test set with PulseEdit, we observe that the EER increases to 0.47 and the AUC reduces to 0.57, indicating that the rPPG signals inserted by PulseEdit can circumvent FakeCatcher, making it consider the deepfake videos as trustworthy. Among the deepfake videos in the test set that are correctly classified by FakeCatcher as forgery, we find that 49% of these videos are classified as unforged videos by FakeCatcher after PulseEdit is applied on them. The above observations show that PulseEdit can degrade the reliability of the FakeCatcher classifier and cheat it to make wrong decisions on the deepfake videos.

VI. DISCUSSIONS
Considering the running time and the HR estimation error of PulseEdit, the proposed PulseEdit is an effective algorithm to edit rPPG signal in facial videos. Compared with the prior art [30] that only focuses on eliminating the rPPG information, we have designed PulseEdit with two modes: rPPG removal and rPPG modification. The former mode can remove the rPPG information and the latter mode can change the rPPG information to a target HR designed by users. The proposed algorithm offers the users more options of editing operations on the physiological signal in facial videos regarding physiological privacy protection. PulseEdit also provides a better capability to remove the physiological signal from videos with head motions (i.e., talking, translation, and rotation), more robust to deal with different practical recording cases.
Considering PulseEdit as an adversarial operation to rPPG technology, we have studied to what extent PulseEdit can circumvent rPPG-based visual security algorithms. As a proofof-concept, we consider the rPPG-based liveness detection and deepfake detection algorithms. The experimental results demonstrate noticeable performance drops between the original videos and the PulseEdit videos, indicating that PulseEdit can successfully mitigate the rPPG-based visual security algorithms. From the perspective of threat modeling for these visual security algorithms, our PulseEdit research suggests that it is important to investigate this and other similar vulnerabilities and improve the rPPG-based visual security algorithms against adversarial operations.
Over the past decade, rPPG technology has been prospering and it has become feasible to monitor vital signs, such as HR, using commercial digital cameras in daily life. One common bottleneck in the R&D of rPPG technology is the lack of sufficient facial videos with known HR of a wide range [44]. PulseEdit in rPPG modification mode may be used to synthesize facial videos with controllable HR to enlarge the dataset and facilitate the R&D of rPPG technology.
In the current form, a weakness of PulseEdit is that it focuses on altering the rPPG information in each frame and has not explicitly considered to conceal the manipulation traces introduced by itself. Forensic tools such as steganalysis can detect the presence of perturbation from the uncompressed frames if available. Nevertheless, we find that lossy video compression is a feasible approach to improve the resistance of the edited frames against forensic analysis and retain the edited rPPG signal in the video. In future work, the inclusion of various forensic undetectability into the framework of PulseEdit and the development of new detectors to detect these manipulations could be two intertwining research directions. In addition, the current form of PulseEdit perturbs the facial pixels independently and the algorithm, in the future, can take spatial and temporal correlations of facial pixels into consideration for the pixel perturbation to further minimize the perceptual distortion of facial videos.

VII. CONCLUSION
In this paper, we have proposed PulseEdit, a novel algorithm that can edit the rPPG signal in facial videos without visible distortion, to protect the physiological information from disclosure. We design a set of perturbation frames to add to the input video frames to change a person's intrinsic rPPG signal that is presented in the facial region. PulseEdit can either remove the rPPG signals on the face or change them to a target heart rate. Extensive experimental results demonstrate the effectiveness and robustness of PulseEdit in different facial subregions, and various rPPG algorithms can no longer detect the accurate heart rate from facial videos after applying PulseEdit. We also show that PulseEdit can potentially circumvent rPPG-based liveness detection and deepfake detection, suggesting a direction for improvement in these areas. In the current form, the traces of PulseEdit can be detected by forensic steganalysis from the uncompressed video frames, but lossy video compression can significantly reduce the forensic performance. We can extend the proposed work by investigating the inclusion of various forensic detectability criteria into the algorithm, to gain insights on the ability of PulseEdit as an antiforensic tool and the competing direction of detecting the manipulations made by PulseEdit.