Familylog: a Mobile System for Monitoring Family Mealtime Activities

Proc IEEE Int Conf Pervasive Comput Commun. Author manuscript; available in PMC 2017 Aug 16.

Published in final edited form as:

PMCID: PMC5558883

NIHMSID: NIHMS894469

FamilyLog: A Mobile Organisation for Monitoring Family Mealtime Activities

Chongguang Bi

^*Michigan Land Academy

Guoliang Xing

^*Michigan Land University

Jina Huh

^‡University of California, San Diego

Wei Peng

^*Michigan State University

Mengyan Ma

^*Michigan Country University

Abstract

Inquiry has shown that family mealtime plays a critical role in establishing adept relationships amid family members and maintaining their physical and mental health. In particular, regularly eating dinner as a family significantly reduces prevalence of obesity. However, American families with children spend simply i hr on family unit meals while three hours watching Tv on an boilerplate work day. Fine-grained activity-logging is proven effective for increasing self-awareness and motivating people to modify their life styles for improved wellness. This paper presents FamilyLog – a applied system to log family mealtime activities using smartphones and smartwatches. FamilyLog automatically detects and logs details of activities during the mealtime, including occurrence and duration of meal, conversations, participants, Television receiver viewing etc., in an unobtrusive fashion. Based on the sensor data collected from existent families, we carefully pattern robust yet lightweight signal features from a set of complex activities during the meal, including clattering sound, arm gestures of eating, homo voice, Television receiver sound, etc. Moreover, FamilyLog opportunistically fuses information from built-in sensors of multiple mobile devices available in a family unit through an HMM-based classifier. To evaluate the real-globe performance of FamilyLog, we perform extensive experiments that consist of 77 days of sensor information from 37 subjects in 8 families with children. Our results show that FamilyLog can notice those events with loftier accuracy beyond different families and home environments.

I. Introduction

Research has shown that the family unit mealtime plays a disquisitional role in establishing good relationships among family members and maintaining their physical and mental health [v][9][7]. In improver to the implications for family wellness, fine-grained analysis of family mealtime enables important studies in sociology and home economic system. For instance, inquiry has showed that the amount of shared fourth dimension (including conversation and eating) between spouses and between parents and children have potent links with family income, female parent's employment status, ages of children, and geographic location (urban or rural) [6][13][12]. However, according to a national survey in 2014, American families with children on an boilerplate work day spend virtually 3 hours watching TV bookkeeping for more than than one-half of the leisure and sport time, while only one hour for family repast [24].

It is shown that activeness logging is a very constructive approach to improving the self-sensation and motivating people to change their behaviors toward a healthy lifestyle [11]. Unfortunately, to appointment, there has been no unobtrusive and convenient methods to log family unit meals and related activities. Some of the available methods for family activity monitoring rely on video-taping [8], which not only incurs considerable installation/analysis costs, but also raises privacy concerns. At that place has been a number of studies on activity recognition using personal wearables and smartphones [nineteen] [33]. Withal, equally we debate in this paper, detecting the activity of individual family members separately is bereft for studying family communications, e.g., due to the fact that young children are usually not allowed to carry personal devices.

This paper presents FamilyLog – the first applied system to log family mealtime activities using smartphones and smartwatches. FamilyLog uses the built-in accelerometer and microphone of the smartphone/smartwatch to detect mealtime activities that are closely related to family wellness, including occurrence, duration, and participants of the family meal as well as conversations and Tv set viewing during the meal. By providing a detailed tape of the family mealtime activities, FamilyLog empowers family unit members to actively engage in making positive changes to improve family health, east.one thousand., preventing kid obesity.

The design of FamilyLog faces several challenges such equally the significant interference from diverse noises in the home. Moreover, uploading sensor data to the deject is often undesirable due to the privacy concerns. To address these challenges, we advisedly design several lightweight acoustic and motion features based on in-depth analysis of data sets from multiple families. Furthermore, FamilyLog employs novel HMM-based sensor fusion techniques to opportunistically leverage multiple congenital-in sensor modalities of mobile devices available in a family, which maximizes the spatiotemporal sensing coverage and achieves robust sensing accurateness across different homes. They tin can also shorten the arrangement training period by incorporating one-fourth dimension user input such as the typical time/frequency of family dinners. We have evaluated FamilyLog with extensive experiments involving 8 families with children (ane or two week recording in each family) and full 251 hours of sensor information collected over 77 days. Our results bear witness the effectiveness of FamilyLog in family unit activity detection (with average 88.7% precision and 93.3% recall for repast detection, and 97.8% precision and 92.viii% call back for the participant identification) beyond dissimilar families and abode environments. The long-term, fine-grained family unit activity history provided by FamilyLog makes it possible to analyze communication patterns/anomalies and improve family life styles.

Two. Related Piece of work

The studies by American Academy of Pediatrics have shown that, good for you family meals are not but helpful in establishing proficient relationships amongst family members, but also critical for the proper development of children'south concrete and mental health [5][nine][7][14]. In order to monitor family mealtime activities, several systems are designed to observe the usage of electrical appliances based on the electromagnetic interference and ambient sensors [26][15][18]. Notwithstanding, these systems can merely detect the activities that involve substantial appliance usage. Recently, activity monitoring using mobile devices has received pregnant attention. Several systems are designed to discover food and drinkable intakes. For example, [xix] presents the pattern of a fork with sensing abilities to assist runway and improve user's eating behaviors. In [33], the authors propose an approach of profiling user'due south gesture while eating using motion sensors on smartwatches. Nonetheless, these systems are focused on tracking eating behavior of individuals, and are non suitable for detecting family unit mealtime activities, which may involve children without wearing any devices, and conversations among family members. Moreover, some mobile wellness systems are designed based on off-the-shelf smartphones to monitor human activities, such as sleep quality [16] or physical activities [28]. Several recent studies are focused on user experiences with mobile health systems such as privacy concerns [1] and sharing behaviors [27]. All the same, these efforts are not concerned with studying family meals or grouping activities.

Acoustic result recognition algorithms take been widely adopted in smartphone-based activity monitoring systems. Auditeur [23] is designed as a mobile-cloud service platform to permit customer's smartphone to recognize diverse sound events such as car honks or domestic dog barking. SoundNet associates ecology sounds with words or concepts in natural languages to infer activities [22]. Recent work shows that the eating activeness can also be detected by the acoustic features [34]. Even so, this work does non pinpoint primary features for detecting family meals. It requires a large amount of data, and employs complex bespeak processing and motorcar learning methods, which raise burden of the implementation on mobile devices.

In order to detect the participants in the conversation, Crowd++ [35] counts the number of speakers using MFCC (Mel-frequency cepstral coefficient) [29] features. Row mean vector of spectrogram [xx] is a unproblematic but effective method for speaker recognition by comparing the Euclidean distance of the energy distributional features. However, voice recognition during a family meal is more challenging due to the presence of significant noise and requires new techniques.

III. Motivation and Requirements

A national survey shows that American families with children spend merely 1 60 minutes on family repast on a typical piece of work twenty-four hour period [24]. Moreover, it is shown that TV viewing during the meal significantly increases the free energy intake [2]. Based on the datasets we collected from 8 families, over threescore% of the family meals are accompanied by concurrent Television receiver viewing. In addition to the occurrence, duration, and frequency of family unit meals, the conversations during a family meal are also important as they constitute a pregnant portion of communications between family members during a day. Analyzing the chat during a family meal is also of import for culture studies [4]. Moreover, it is shown that, by reviewing detailed activeness logs, people are motivated to modify their behaviors toward a salubrious lifestyle [11][vi][13][12].

In that location has been a number of studies on personal activity recognition using wearables and smartphones [19] [33]. All the same, we debate that detecting the activity of private family members separately is insufficient. First, the existing solutions typically crave the mobile device (smartphone or wearable) to be carried by the user. As a issue, they cannot be applied to detect many activities of immature children who are usually non allowed to carry personal devices. Second, many people do not carry smartphone or wear watch constantly at home, making it difficult to monitor i's activity continuously. Moreover, detecting each individual'due south beliefs is often unnecessary or significantly more challenging when she/he is participating in a group activeness. For instance, detecting whether a particular family unit member is eating based on sound is more difficult when the family unit is having dinner together due to the higher level of ambient racket.

FamilyLog is designed to be an unobtrusive organisation that helps users go on track of their family unit mealtime activities. It employs the congenital-in accelerometer and microphone of smartphones and smartwatches to discover various information and activities related to a family meal. Specifically, FamilyLog is designed to meet the following requirements: 1) Since FamilyLog needs to operate in parallel with family mealtime activities. It must to be unobtrusive to use. Information technology should minimize the burden on the user, e.g., without requiring the users to deport extra devices, and should not interfere with the users' daily activities past whatever means. 2) FamilyLog needs to monitor the details of family meals, including their start/end time, participants, and possible TV viewing, in a robust manner, i.e., across different users, smartphones, smartwatches and households. 3) Since family meals involve privacy sensitive activities such every bit family conversation, the privacy of the family unit needs to be strictly protected. For example, the arrangement should process the nerveless sensor samples on the fly and only keep the results, instead of storing or transmitting whatever raw data, which may incorporate sensitive data such every bit contents of the conversations. The sensing algorithms we develop tin can accurately classify a number of important contextual features of activities such equally arm gestures from wearables, eating sounds, environmental noise, conversations, etc. As a outcome, in the time to come, these algorithms tin can be adapted and used as building blocks to notice a wide range of family activities such as parties, family meetings, gaming etc.

Iv. Organization Blueprint

FamilyLog detects family meals by using the built-in sensors of mobile devices, namely microphone on smart-telephone/tablet and both microphone and accelerometer on smartwatch^¹. However, FamilyLog is designed to leverage these sensing modalities in an opportunistic manner depending on the availability of mobile devices in a home. In particular, FamilyLog may achieve satisfactory sensing performance even with a single smartphone when it is placed in the proximity of family activities (run into Department V). When multiple devices are available, FamilyLog runs separately on each individual device and fuses the detection results to achieve meliorate performance and extended coverage.

As shown in Fig. 1, FamilyLog consists of iv components: pre-processing, acoustic characteristic extraction, motion feature extraction, and HMM-based action classification.

An external file that holds a picture, illustration, etc. Object name is nihms894469f1.jpg

In pre-processing, sensors are sampled at certain rate and the samples are framed. A frame is discarded if it but contains noise which is indicated past low variance. Otherwise, each audio-visual frame is processed to excerpt energy features using filters based on Mel-frequency cepstrum coefficients (MFCC). In the acoustic and move characteristic extraction components, FamilyLog groups information frames (lms past default) into a detection window (iiimin by default), and extracts a set of distinct features for each window. Specifically, FamilyLog extracts gesture-related motion features such as the average X-axis acceleration and changing rate, and audio-visual features to observe the clattering sounds and the human vocalization.

To detect activities from extracted features, FamilyLog adopts a HMM-based (Hidden Markov Model) classifier. Compared with several commonly used classifiers similar Back up Vector Machine (SVM) that are only applicable to detached event detection, HMM can naturally capture the temporal pattern of family activities past incorporating continuous sensor input. The HMM classifier is trained by a combination of short menstruation of sensor information, e.chiliad., a one-twenty-four hours family activities labeled by users, and some general cognition of family meals which can be obtained from a old user input or a brief survey with elementary questions such as "how much time does your weekday dinner usually take?".

A. Pre-processing

The chief objective of pre-processing is to reduce unnecessary computation and set up data for feature extraction. Specifically, it consists of the following three components.

Get-go, FamilyLog reduces the unnecessary computation by discarding detection windows that likely incorporate only environmental noises (e.g., noise of appliances). Specifically, the dissonance detection is achieved by first computing the root hateful square (RMS) (i.due east., the volume of indicate) for each frame, and so computing the variance of RMS of all the frames within each window. A key observation is that a window with low RMS variance merely contains ambience dissonance. Similarly, FamilyLog discards the motion data with low RMS variance, which typically indicates a stationary smartwatch not worn by the user.

2nd, to increment computational efficiency, FamilyLog represents acoustic data with MFCC-based features, which will be used in afterwards feature extraction. For each frame, FamilyLog kickoff calculates its energy spectrum from 80Hz to viiikHz with the Fast Fourier Transform (FFT) [32]. So the resulting spectrum is transformed into 21 energy channels past applying Mel Filters [21][25][30]. The energy of aqueduct i will be represented as east_i hereafter.

Third, to preserve power, FamilyLog turns on sensor sampling just when the device is home, which can be determined past the arrangement location. Moreover, every bit an optional feature, FamilyLog tin starting time the the sensor sampling of a new detection window probabilistically based on the percentage of historical dissonance frames in a predefined fourth dimension window. We note that this strategy may turn off sampling falsely when noise appears in a burst within an event of involvement. In the future, we volition take into account the feedback from event detection component and reduce the sensor sampling when no activity is detected.

B. Feature Extraction

FamilyLog identifies the occurrence of the family meals by several key characteristics, based on sounds and gestures associated with dining and whether the family members are currently in close proximity to i another. Specifically, we use the post-obit features to narrate the family unit meals. The first feature is the clattering sound acquired past clashes between tableware. This is because the clattering sound is the near distinctive audio-visual characteristic of family dining activeness, regardless of other dynamics, such as the type of nutrient and variation of tableware. The 2nd characteristic is the gesture of the users captured past smartwatches. When the user is holding food or using tableware, the arm of the user often exhibits a certain pattern of movements. The third characteristic is the human vocalisation, i.e. the chat betwixt family unit members, which implies that the family members are near each other.

1) Clattering Audio

To infer family unit repast events, FamilyLog calculates the occurrences and frequency of clattering sound within a detection window. It looks for an energy peak from channel 12 to 16 (associated with frequency ranging from i − 4kHz) for each 50ms frame. Specifically, for each frame, it computes ē_all , the average energy over all channels, and ē ₁₂₋₁₆, the average energy across channel 12 to 16. The feature associated with clattering audio is calculated as r = ē ₁₂₋₁₆/ē_all . For example, Fig. 2 shows an example of clattering sound detection in a typical family unit meal scenario. Fig. 2(a) shows the free energy on 21 channels over time, and Fig. ii(b) shows the corresponding ē _12−xvi and ē_all . We can encounter that one occurrence of clattering sound may result in several continuous clattering frames with college ē ₁₂₋₁₆, even when the clattering sound and man voice overlapped around ane 2nd. Therefore, comparing ē ₁₂₋₁₆ and ē_all is a simple and effective style of detecting clattering sound in typical family meal scenarios. After obtaining r for each audio-visual frame, FamilyLog calculates Eastward[N_clattering ] which represents the expectation of amount of clattering audio independent in a detection window. Specifically, Eastward[N_clattering ] is calculated as the sum of P(clattering|r) which is preset in the system and generated using the data collected from five families. Fig. 3 shows an case of clattering sound detection based on the real data set collected in a abode. Nosotros can see that all family meal windows contain big numbers of clattering frames. The clash of other objects such as keys and coins can likewise produce a similar sound. Unlike from clattering frames of dining activity, such faux alarms are usually isolated and not probable to occur in a burst.

An external file that holds a picture, illustration, etc. Object name is nihms894469f2.jpg

An example of clattering audio detection in a typical family meal scenario. (a) shows the energy on 21 channels over time, where clattering audio and human vocalism are marked with rectangles. (b) shows the comparison between ē _12−xvi and ē_all for the same sound clip.

An external file that holds a picture, illustration, etc. Object name is nihms894469f3.jpg

An case of family meal detection. Each bar represents the expected number of frames containing clattering sound in a detection window.

2) Arm Gesture

When smartwatch is bachelor, FamilyLog besides extracts movement-based features that characterizes dining behavior, which include the acceleration on the X-axis ( ${\bar{Acc}}_{x}$ ) and the irresolute rate of the acceleration ( $\bar{R}$ _c ). The X-axis acceleration is sensitive to various arm movements, as its direction is e'er parallel to the user'south arm. Therefore, it can be used every bit a simple and constructive feature for inferring arm gesture while avoiding the overhead of data processing on the other 2 dimensions. Specifically, FamilyLog samples the built-in motion sensor on smartwatch and calculates two features for each frame. The X-axis acceleration is directly read from the accelerometer. The irresolute rate between 2 frames can be computed as the angle betwixt two acceleration vectors from them. Since the acceleration is mostly corresponded to the gravity, the bending describes how much the orientation of the watch confront is turned along with the user'due south action. For a detection window, ${\bar{Acc}}_{ten}$ is calculated equally the average acceleration on X-centrality for all frames, and $\bar{R}$ _c is calculated by the boilerplate changing rate of all neighboring frames. Fig. 4 shows 3 typical activities and the motion features. Nosotros can see that the arm gesture and the movements of wrist during meal show singled-out distributions.

An external file that holds a picture, illustration, etc. Object name is nihms894469f4.jpg

Examples of typical activities and related movement features. The left column shows a dining scenario ( ${\bar{Acc}}_{x} = 2.69 m / s^{two}$ , $\bar{R}$ _c = 10.41°). The heart column shows a Tv viewing scenario ( ${\bar{Acc}}_{x} = iv.25 1000 / s^{ii}$ , $\bar{R}$ _c = 0.85°). The right column shows a walking/continuing scenario ( ${\bar{Acc}}_{x} = - 6.98 m / s^{2}$ , $\bar{R}$ _c = 3.19°). The upper row shows the footing truth at some moments during these activities, and the arrows in these photos betoken the direction of X-centrality. The acceleration on X-axis is shown for each photo. The center row shows the acceleration on X-axis in each frame. The lower row shows the irresolute rate of acceleration in each frame.

C. Chat and Television Viewing Detection

1) Human Voice Identification

An important acoustic feature for the detection is the chat, which identifies human speech communication, every bit well every bit the family members who participate in it. Among all the family communications, the family meal is typically accompanied past a considerable corporeality of conversations. The speaker recognition technique presented in [10] shows that pronunciation of vowels is a identical characteristic of human being. However, maintaining a database for voice of each family member is plush for mobile devices. Here, row mean vector of spectrogram [twenty] provides an constructive and efficient approach to recognize speakers by measuring Euclidean distance of energy distribution on frequency domain. Specifically, the family members are required to annals their voice to FamilyLog by reading a short judgement. For each frame, FamilyLog compares the vector from MFCC-based processing with the ones obtained during training, and calculates the probability that the frame contains vox of at to the lowest degree one family member past cosine similarity, represented as P(phonation| E ), where E is the energy distribution in the frame, as shown in Fig. 5. In a detection window, FamilyLog sums P(voice| East ) for each frame to extract E[N_voice ], representing the expection of number of frames that contains family unit members' vocalism.

An external file that holds a picture, illustration, etc. Object name is nihms894469f5.jpg

An example of conversation detection during a typical dining scenario.

2) Localization-based Goggle box Viewing Detection

TV is a sound source with fixed location, whose volume unremarkably stays within a express range. However, the clattering audio and the conversation during the mealtime come from multiple sound sources. We can focus on detecting the number of audio sources to observe the existence of the TV, and seperate TV sound from the clattering sound or the human being voice. FamilyLog employs a novel approach based on Interaural Level Divergence (ILD) [iii] that fuses acoustic features captured by different devices (features are exchanged on cloud servers, local wifi or bluetooth connections) to make up one's mind the sound sources. In this section we simply focus on the fusion algorithm for two devices although it tin be extended to more generic scenarios. Specifically, the process of feature fusion consists of two steps: similarity check and sound source detection. In the get-go step, it figures out whether ii devices are at home and well-nigh each other past examining the similarity between sound captured by two devices. Nosotros define the detection windows that cover the same period of time on 2 different devices every bit the binaural detection windows. The similarity between binaural detection windows A and B tin can be calculated equally follows:

$C (A, B) = \frac{\sum_{i = 1}^{l} \cos (E (A, i), E (B, i))}{fifty}$

(1)

where vector E (Ten, i) is the energy distribution for frame i in detection window X. FamilyLog only proceeds to conduct sound source detection if C(A,B) is above a threshold, indicating the two devices are in proximity to one another. The audio source detection aims to detect the number of sound sources in binaural detection windows. A cardinal ascertainment is that if all the acoustic indicate originates from a single sound source, it is more than likely caused past TV. In contrast, if the acoustic signal originates from multiple sound sources, it is more than likely to be caused by human activities other than Television. The method nosotros use to detect sound sources is based on acoustic localization by ILD. Specifically, if the acoustic signal is from a single source and captured by two receivers, information technology satisfies $5_{one} / 5_{2} = d_{1}^{two} / d_{2}^{ii} = Δ_{Five}$ , where V _i and V ₂ are volumes received past receivers and d ₁ and d _ii are distances between receivers and sound source, calculated by the RMS. This equation can be applied to compute the relative distances between the sound source and the devices. In indoor scenarios, Δ _V may be impacted past diverse factors (e.g., echoes and obstacles), merely its coefficient of variation is limited when d _i and d ₂ are fixed. To detect whether the acoustic signals come up from the same source, we ascertain Coefficient of Variation of Volume Ratio per Frame (CV (A,B)) in binaural detection windows A and B equally:

$\begin{matrix} C V (A, B) = \frac{σ (Δ_{V} (A, B))}{μ (Δ_{V} (A, B))} \\ Δ_{V} (A, B) = {\frac{{Five}_{A, i}}{V_{B, i}}, i \in [1, l]} \end{matrix}$

(2)

Here, the volume of frame i in detection window X is represented past V_X,i , μ(Δ ₅ (A,B)) is the mean of volume ratios between A and B, and σ(Δ _V (A,B)) is the standard deviation of book ratios. CV (A,B) thus is the ratio of the standard departure to the hateful. The lower CV (A,B) is, the more than likely the acoustic signals come up from a unmarried source. Fig. 6 shows an example of how to detect sound sources by volume ratio. In the first xx seconds, phone B is carried by user from the dining tabular array to the sofa. Tv is turned on at the 30th second. During the 70th–75th second and the 140th–150th second, the subjects talk to each other. We can see that when the frames simply contain TV sound, volume ratio is relatively stable. In contrast, as chat involves multiple audio sources, the variance of the volume ratio is significantly increased.

An external file that holds a picture, illustration, etc. Object name is nihms894469f6.jpg

An instance of TV viewing with chat. (a) shows captured book past two smartphones. (b) shows the volume ratio between corresponding frames.

By detecting the audio source with multiple devices, the accuracy of the detection of family meals can be improved in several challenging scenarios. Although TV programs that contain similar sound every bit family meal or chat may exist misclassified, the frames contain clattering sound and conversation notwithstanding come up from a single source, and they will be more probable from Television set than family activities.

Tv set sound during the family repast can be separated from "foreground" sounds (clattering, conversation, etc.) past extracting low-energy frames, i.e. the frames that have a RMS less than the average RMS in a detection window. To discover whether Idiot box is on during the family meal, nosotros can cheque the volume of sound from all low-energy frames, and whether the acoustic betoken is probably from a single sound source. If the TV is on, the continous sound from Television set will rise the volume of low-energy frames, and CV (A,B) of all low-energy frames will have a relatively low value, indicating the sound comes from a single sound source with fixed location.

D. HMM-based Classification

Similar to voice communication and gesture recognition, the family repast detection involves identifying a temporal design rather than detecting discrete events. Nosotros design the classifier of FamilyLog based on HMM, where we treat extracted features as observations, and the family unit issue contained in each detection window as subconscious country. Therefore, the primary goal of the our HMM-based classification is to recover the family unit events overtime using the features extracted from a sequence of detection windows.

Fig. 7 shows the HMM-based classifier for family unit meal as an example. We tin see that in this case, the state is either "having meal" or "not having meal", and the observations includes four features extracted from each detection window. The transition probabilities between two states are simply generated based on a simple survey conduced before using the system. The emission parameters associating states with observations are calculated using the one-twenty-four hour period training data. Therefore, nosotros formally define the our HMM-based classification equally follows:

where X is a sequence of states; λ represents the transition probabilities Φ and emission parameters Θ of the HMM; O is a sequence of observations. The output of the classifier is a sequence of states that maximize the likelihood, which can be calculated by using the Viterbi algorithm.

An external file that holds a picture, illustration, etc. Object name is nihms894469f7.jpg

The Hidden Markov Model of one family unit communication action

1) Transition Probabilities

The set of transition probabilities Φ contains iv entries {ϕ _(i,1) , ϕ _(1,2) , ϕ _(2,1) , ϕ _(two,2)}. According to the definition of our HMM, when the activity is not occurring, we merely need to know the probability of its occurrence in next detection window. On the other side, while the activity is currently occurring, we only need to know the probability of whether it continues in the next detection window. Therefore, 2 models are enough to describe all the transition probabilities, which are the probability distribution of one activity'southward occurrence related to time/date, and the probability distribution of its duration.

The probability distributions can be estimated based on one-time user answers to questions like "What's the typical frequency and duration of your weekday family unit meals?". Alternatively, they can be derived from historical detection results. To ameliorate the accuracy of such an arroyo, FamilyLog presents intuitive system UIs that allow users to rate previous detection results. The characteristics of family unit meal, including fourth dimension, duration, and frequency are frequently highly dependent on the day of the calendar week. Therefore, FamilyLog generates unlike models for the weekdays and weekends.

The transition probabilities of our HMM can be read directly on generated models. When applying the Viterbi algorithm, the transition probability from S _two to Southward _ane is equal to the probability of the action's occurrence; the transition probability from S ₁ to Southward _one is equal to the probability of the activity'due south continuance to the next 3 minutes. Considering the transition probabilities are dependent on previous states (due to the influence of activity's elapsing), we demand to conform the structure of our HMM to ensure Viterbi algorithm runs properly. As shown in Fig. 8, all transition probabilities are independent of previous states, at the price of increased retention usage. In exercise, we run Viterbi algorithm for 40 states in this HMM (i.e. xl detection windows) to ensure that any activities within two hours can exist captured.

An external file that holds a picture, illustration, etc. Object name is nihms894469f8.jpg

The adapted HMM structure. An activity is divided into multiple states as S ₁ _,k , indicating that the activity already lasts for g detection windows. Hither Southward ₁ _,1000 can only transit to S ₁ _,m ₊₁ or Southward _two, corresponding to the cases where the action continues to the next detection window or stops, respectively.

two) Emission Parameters

The gear up of emission parameters Θ contains entries as θ ₍ _S, _o ₎, which describes the probability to find the observation o in state S. The ascertainment o within a detection window is represented as a vector of features, i.e. o =< o _i , o _ii , o ₃, … >, where o_i corresponds to a feature related to the activity. For the detection of the family meals, the features are shown in Tabular array. I.

TABLE I

Features for the family meal detection

Term	Description
Eastward[Northward_clattering ]	The expectation of number of frames containing clattering sound
E[N_voice ]	The expectation of number of frames containing the family members' voice
${\bar{Acc}}_{x}$	The average acceleration on Ten-centrality
$\bar{R}$ _c	The changing rate of acceleration
CV (A,B)	The coefficient of variation of volume ratio per frame in binaural detection windows A and B

The HMM classifier is trained past a period of sensor data. Typically, at least a whole solar day is required to fully cover the advice of family. After grooming, nosotros apply Gaussian KDE to calculate the PDF, which is represented as p( o |S), corresponding to the observations associated to each state. Therefore, the emission parameter is defined equally θ ₍ _S, _o ₎ = p( o |S) for the HMM with multiple continuous observations [17].

Five. Operation Evaluation

In order to evaluate the performance of FamilyLog, we take collected 77 days of data from 37 subjects in 8 families (details shown in Table 2). The process of the data collection has been approved by the Institutional Review Boards (IRB) at the Michigan State University. The period of data collection was one or two weeks for each family. We intentionally chose families with young children for this study considering family routine analysis has of import implications for children's health. Our results likewise showed that small-scale children frequently sometimes presented challenges to event detection due to the excessive noise they make at domicile.

Table II

Families that participated in the experiment

Family unit	Children (Ages in Years)	Phone	Smartwatch	Data (Weeks)	Family Meals (Number of Times)
one	i girl(5)	Nexus iv	N/A	1	iv
ii	1 daughter(iv)	Nexus 4	N/A	1	6
3	2 daughters(5, 8),ii sons(1, 3)	Nexus 4	Sony Smartwatch 3	2	nine
4	3 sons (1, 3, 5)	Nexus 3	Sony Smartwatch 3	2	16
v	ii sons (iii, 5)	Moto G	N/A	1	5
6	2 daughters(ane,three),1 son(7)	Moto G2 × two	Sony Smartwatch 3 × 2	ii	22
7	ii daughters(three,11),2 sons(vii,13)	Moto G2 × two	N/A	i	ten
8	three daughters(7,x,18)	Moto G2 × ii	Sony Smartwatch 3	1	6

We provided each family one or multiple devices. An app pre-installed on the devices continuously records audio and motility unless the device is taken out of home. The app runs automatically, but the subjects can manually start/cease the recording on any device. The devices tin exist carried with the subjects or left somewhere in the house, depending on their habits. We too offer them the opportunity to review the recording and delete the part of recording that raises privacy concerns. We adopted ii methods to obtain the footing truth, which include an interview with the family unit members immediately subsequently information collection is finished, and listening to the recordings to manually label family activities.

A. Micro-calibration Routine Analysis

To evaluate the operation of our HMM-based classifier, we compare our classification consequence with the consequence classified past the Support Vector Machine (SVM), which recognizes the family meals merely based on features in individual detection windows rather than considering their temporal nature. The overall performance of FamilyLog and its comparison with SVM will be discussed subsequently in SectionV-B.

Fig. 9 shows the detection results forth with the ground truth of the data from 5 days in Family four. We tin can see that the family usually has dinner around vii–8 pm for about an hour, except for day 5, which is Fri, when they started dinner at around 8 pm for about twenty minutes. Compared with the ground truth, we can see that FamilyLog is accurate in detecting most of the meals. In day 3 and v, the SVM classifier yields a few misclassifications due to the interferences caused past Tv set viewing. However, FamilyLog's HMM-based classifier is able to avoid such false negative errors. Furthermore, by taking into business relationship the temporal nature of family unit routine activities, HMM is able to minimize the short simulated negative and false positive classification results.

An external file that holds a picture, illustration, etc. Object name is nihms894469f9.jpg

Detected family meals based on data collected from family four during 5 days.

B. Evaluation of Meal Detection

In this section, we investigate the overall operation of FamilyLog in detecting family meals. For each individual family, the HMM-based classifier is trained using the information from the survey and data labeled past the subjects nerveless in the showtime day. We apply the precision and recall as the metrics for this evaluation. Specifically, the precision is divers as the ratio of the number of true-positive windows to the full number of windows. The recall is divers every bit the number of truthful-positive windows divided by the total number of windows detected as family unit meals. The true negatives are not considered, because most of the windows containing no activities are able to exist detected, and they have been discarded. In addition, we as well nowadays the evaluation consequence afterward making sure relaxation (eastward.g., ±3min) on the offset/end time. Annotation that our design objective volition not be afflicted by minor errors in beginning/stop time, every bit long as the the arrangement is able to accurately identify the occurrences of the family meals.

The evaluation result of family meal detection is shown in Fig. 10. Our HMM-based classifier outperforms SVM past half dozen.82% on average in recall. This is primarily because HMM is more than constructive in correcting isolated false negatives. We tin can also observe that FamilyLog achieves an overall precision of 81.1%, with the highest being 91.1% for family unit 1 and lowest being 62% for family 4. We found that the ii major causes of the relatively depression precision in family 4 and 5 are the high pitch phonation from children and music, which have similar acoustic features as clattering sound during meal. However, since these sounds usually have a brusk elapsing, FamilyLog is able to correct a considerable amount of the resulting false positives.

An external file that holds a picture, illustration, etc. Object name is nihms894469f10.jpg

Overall accuracy of family meal detection in detection windows. The boilerplate precision and recall of FamilyLog are fourscore.seven% and 89.five%, respectively.

For the detection that is but based on the motion data from the smartwatch or the audio-visual data, the accuracy is shown in Fig. 11 for Family three, 4, and 6. For the Family unit 3 and 4, the precision is very low when only motion information is used. The reason is that the motion data for the eating activeness tin can be very like to some activities like reading or writing, especially when the smartwatch is wear on the non-dominant hand. On the other side, the remember is relatively loftier, because nearly of the family unit meals are able to be correctly detected by the motility information. Moreover, the smartwatch in Family unit 6 is rarely worn when they are at abode, and the detection based on the movement information is not always reliable. More often than not, the features from the acoustic information contribute the major role of the detection, and the motility data can aid the detection in some special cases. For example, depending on the food, the clattering sound may exist weak for a family unit meal, but the detection result tin can still be correct due to the conversation between the family unit members and the eating action.

An external file that holds a picture, illustration, etc. Object name is nihms894469f11.jpg

Accuracy of family meal detection using only motion or audio-visual data.

Fig. 12 shows a comparing of the accuracy of the detection by a single smartphone or all the available devices in a family. If FamilyLog only runs on a single device, the sound source detection will be unavailable. This happens in Family half-dozen, where the sound from a Tv set program nigh cooking is wrongly detected as a meal without knowing the sound sources. Furthermore, during a family repast, if 1 device is left far abroad from the dining table but another device is nigh, it is possible that the meal tin only be detected past one device. By combining the results from all the available devices, FamilyLog is less likely to miss a family meal than only relying on i of them.

An external file that holds a picture, illustration, etc. Object name is nihms894469f12.jpg

Accuracy of family meal detection by a single phone or all the available devices.

Fig. xiii shows the detection accuracy of the occurrence of each family unit meal. We tin can meet that FamilyLog rarely fails to notice an occurrence of family meals with the ixmin-relaxation on the starting/ending fourth dimension, achieving 88.7% precision and 93.3% remember on boilerplate. The detection mistake in each family meal's duration is well-nigh four minutes on boilerplate.

An external file that holds a picture, illustration, etc. Object name is nihms894469f13.jpg

Accuracy of detection for each occurrence of family meal by relaxing the start/end time past ±iiimin and ±9min.

C. Participant and TV Detection

The human voice serves as a clue for family meals, and is as well unique characteristic of a participant. With the permissions from the Family unit 1–v, we listened to the raw acoustic data provided past them, and manually labeled the family members who take talked during each family meal. For each family, nosotros but focus on the mother, the male parent, and one selected child anile between 5–12. We count the number of detection windows that Family Log tin correctly detect all the participants, and calculate the precision and recall. Fig. 14(c) shows that, FamilyLog achieves high accuracy in participant detection across different families, with the boilerplate precision and recall being 97.eight% and 92.8%, respectively. This means most of the detection windows yields right results for participant detection. This also ensures a high accuracy for detecting all participants for each occurrence of the family meal.

An external file that holds a picture, illustration, etc. Object name is nihms894469f14.jpg

Evaluation of participant detection. (a) shows accuracy of speaker recognition. (b) shows each member's overall proportion in family conversation. (c) shows accurateness of overall participant detection for each family. The average precision and call back are 97.eight% and 92.viii%, respectively.

Fig. fourteen(a) shows the participant identification consequence for each family member. One key ascertainment is that the overall recognition accuracy of other family unit members are better than that of fathers. This is mainly due to the fact that father usually speaks with brusk sentences or only phases, which are more difficult to detect. Fig. xiv(b) shows the proportion for each family member in overall conversation. We can see that father speaks less oft that other family unit members, which is also consequent with the findings from social beliefs studies [31]. Some other observation is that the child from family v has a relatively low call back. This is mainly considering he often speaks with different tones, thus the voice is difficult to identify using the signature extracted from his preparation data.

Tabular array Three shows the result of detection of TV viewing during the family meals. Some of the detections are unavailable, because only one device is used for the experiment, while the sound source detection requires at least two different devices. The accurateness is 91.viii% precision and 89.5% recall. The errors are mainly acquired by the noise, or the long distance between the Tv and the devices. It can be seen that the TV is turned on during over 60% of the family meals in our experiment.

TABLE III

Boob tube viewing during family meals

Full: 78		Detected		Non available
		Tv is on	TV is off
True	TV is on	34	4	15
	TV is off	3	22

Vi. Decision

In this paper we nowadays the design and implementation of FamilyLog – a applied system to log family mealtime activities using off-the-shelf smartwatches and smartphones. It uses the built-in accelerometers on the smartwatches and microphones of all mobile devices to detect family mealtime activities that are closely related to the family health, including occurrence and elapsing of repast, conversations, participants, and TV viewing. The design of FamilyLog addresses several challenges such every bit the significant interferences from diverse noises in the home. We carefully clarify the sensor data collected from existent families and design the point features for HMM-based action classification, which are robust against diverse noises and tin exist computed efficiently on mobile devices. We have evaluated FamilyLog with all-encompassing experiments involving 8 families with children (at least one-calendar week recording in each family unit). Our results show that FamilyLog is effective in logging details of family unit meals across dissimilar families and home environments.

Acknowledgments

This piece of work was supported in part by U.S. National Scientific discipline Foundation under grant IIS1521722.

Footnotes

¹Most off-the-shelf smartwatches send with microphone for vocalization command and making calls.

References

1. Avancha S, Baxi A, Kotz D. Privacy in mobile technology for personal healthcare. ACM Comput Surv. 2012 Dec;45(1):iii:i–3:54. [Google Scholar]

2. Bellissimo N, Pencharz Lead, Thomas SG, Anderson GH. Upshot of idiot box viewing at mealtime on nutrient intake after a glucose preload in boys. Pediatric Research. 2007;61(six):745–749. [PubMed] [Google Scholar]

3. Birchfield ST, Gangishetty R. Acoustic localization by interaural level difference. Acoustics, Speech communication, and Indicate Processing, 2005. Proceedings.(ICASSP'05). IEEE International Conference on; IEEE; 2005. pp. four–1109. [Google Scholar]

4. Blum-Kulka S. Dinner talk: Cultural patterns of sociability and socialization in family discourse. Routledge; 1997. [Google Scholar]

v. Boyce WT, Jensen EW, Cassel JC, Collier AM, Smith AH, Ramey CT. Influence of life events and family routines on babyhood respiratory tract illness. Pediatrics. 1977;60(4):609–615. [PubMed] [Google Scholar]

6. Bryant WK, Zick CD. An examination of parent-kid shared time. Journal of Wedlock and Family. 58(ane):227–237. [Google Scholar]

vii. Denham SA. Relationships between family rituals, family routines, and health. Journal of Family Nursing. 2003;9(iii):305–330. [Google Scholar]

8. Dishion TJ, Nelson SE, Kavanagh One thousand. The family check-up with high-risk immature adolescents: Preventing early on-onset substance employ by parent monitoring. Behavior Therapy. 2003;34(four):553–571. [Google Scholar]

9. Dunst CJ, Hamby D, Trivette CM, Raab M, Bruder MB. Everyday family unit and customs life and children's naturally occurring learning opportunities. Journal of Early Intervention. 2000;23(iii):151–164. [Google Scholar]

10. Fakotakis N, Tsopanoglou A, Kokkinakis G. Text-contained speaker recognition based on vowel spotting. Digital Processing of Signals in Communications, 1991., 6th International Conference on; Sep 1991.pp. 272–277. [Google Scholar]

11. Fiese BH, Hammons A, Grigsby-Toussaint D. Family unit mealtimes: a contextual approach to understanding childhood obesity. Economics & Human Biology. 2012;10(4):365–374. [PubMed] [Google Scholar]

12. Ginsburg KR, et al. The importance of play in promoting healthy child development and maintaining strong parent-kid bonds. Pediatrics. 2007;119(1):182–191. [PubMed] [Google Scholar]

13. Goebel KP, Hennon CB. Mother's fourth dimension on meal training, expenditures for meals abroad from home, and shared meals: Effects of mother'southward employment and age of younger kid. Home Economics Inquiry Journal. 1983;12(two):169–188. [Google Scholar]

14. Greening L, Stoppelbein L, Konishi C, Jordan SS, Moll K. Child routines and youths adherence to handling for type 1 diabetes. Journal of Pediatric Psychology. 2007;32(4):437–447. [PubMed] [Google Scholar]

15. Gupta S, Reynolds MS, Patel SN. Electrisense: Unmarried-point sensing using emi for electric consequence detection and classification in the domicile. Proceedings of the twelfth ACM International Conference on Ubiquitous Computing, Ubicomp '10; New York, NY, Usa. ACM; 2010. pp. 139–148. [Google Scholar]

xvi. Hao T, Xing Chiliad, Zhou Yard. isleep: Unobtrusive sleep quality monitoring using smartphones. Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, SenSys 'thirteen; New York, NY, USA. ACM; 2013. pp. iv:1–4:fourteen. [Google Scholar]

17. Honkela A. PhD thesis. Citeseer: 2001. Nonlinear switching state-infinite models. [Google Scholar]

xviii. Huang Z, Zhu T. Sbd: A signature-based detection for activities of appliances: Poster abstruse. Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, BuildSys '14; New York, NY, USA. ACM; 2014. pp. 222–223. [Google Scholar]

19. Kadomura A, Li C-Y, Chen Y-C, Chu H-H, Tsukada K, Siio I. Sensing fork and persuasive game for improving eating behavior. Proceedings of the 2013 ACM briefing on Pervasive and ubiquitous computing offshoot publication; ACM; 2013. pp. 71–74. [Google Scholar]

xx. Kekre HB, Athawale A, Desai M. Speaker identification using row mean vector of spectrogram. Proceedings of the International Briefing & Workshop on Emerging Trends in Engineering, ICWET '11; New York, NY, U.s.. ACM; 2011. pp. 171–174. [Google Scholar]

21. Kopparapu S, Laxminarayana M. Choice of mel filter bank in computing MFCC of a resampled oral communication. Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Briefing on; May 2010.pp. 121–124. [Google Scholar]

22. Ma X, Fellbaum C, Cook PR. Soundnet: Investigating a linguistic communication composed of ecology sounds. Proceedings of the SIGCHI Conference on Man Factors in Computing Systems, CHI 'ten; New York, NY, USA. ACM; 2010. pp. 1945–1954. [Google Scholar]

23. Nirjon S, Dickerson RF, Asare P, Li Q, Hong D, Stankovic JA, Hu P, Shen G, Jiang X. Auditeur: A mobile-deject service platform for audio-visual event detection on smartphones. Proceeding of the 11th Almanac International Conference on Mobile Systems, Applications, and Services, MobiSys '13; New York, NY, USA. ACM; 2013. pp. 403–416. [Google Scholar]

24. U. B. of Labor Statistics. American Time Apply Survey. 2015. [Google Scholar]

25. O'Shaughnessy D. Addison-Wesley series in electrical technology: digital point processing. Universities Press (India) Pvt. Express; 1987. Speech communication: human and car. [Google Scholar]

26. Phillips DE, Tan R, Moazzami M-K, Xing G, Chen J, Yau DKY. Supero: A sensor system for unsupervised residential power usage monitoring. 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom); 2013. pp. 66–75. [Google Scholar]

27. Prasad A, Sorber J, Stablein T, Anthony D, Kotz D. Agreement sharing preferences and beliefs for mhealth devices. In. Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Order, WPES '12; New York, NY, USA. ACM; 2012. pp. 117–128. [Google Scholar]

28. Rabbi Thousand, Ali S, Choudhury T, Berke E. Passive and in-situ assessment of mental and physical well-being using mobile sensors. Proceedings of the 13th International Briefing on Ubiquitous Computing, UbiComp 'eleven; New York, NY, The states. ACM; 2011. pp. 385–394. [PMC gratis commodity] [PubMed] [Google Scholar]

29. Sahidullah Thousand, Saha K. Pattern, assay and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech communication Advice. 2012;54(iv):543– 565. [Google Scholar]

30. Stevens SS. A Scale for the Measurement of the Psychological Magnitude Pitch. Acoustical Society of America Periodical. 1937;8:185. [Google Scholar]

31. Tannen D. Yous just don't understand: Women and men in conversation. Virago; London: 1991. [Google Scholar]

32. Tharini D, Kumar J. 21 band 1/3-octave filter bank for digital hearing aids. Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on; March 2012.pp. 353–358. [Google Scholar]

33. Thomaz Eastward, Essa I, Abowd GD. A practical arroyo for recognizing eating moments with wrist-mounted inertial sensing. Proceedings of the 2015 ACM International Joint Briefing on Pervasive and Ubiquitous Calculating; ACM; 2015. pp. 1029–1040. [PMC free article] [PubMed] [Google Scholar]

34. Thomaz E, Zhang C, Essa I, Abowd GD. Inferring meal eating activities in existent globe settings from ambient sounds: A feasibility study. Proceedings of the 20th International Conference on Intelligent User Interfaces, IUI '15; New York, NY, USA. ACM; 2015. pp. 427–431. [PMC free article] [PubMed] [Google Scholar]

35. Xu C, Li S, Liu K, Zhang Y, Miluzzo Due east, Chen Y-F, Li J, Firner B. Crowd++: Unsupervised speaker count with smartphones. Proceedings of the 2013 ACM International Joint Briefing on Pervasive and Ubiquitous Computing, UbiComp 'xiii; New York, NY, Usa. ACM; 2013. pp. 43–52. [Google Scholar]

lamhingto.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558883/#:~:text=FamilyLog%20is%20designed%20to%20be,related%20to%20a%20family%20meal.