論文

合成人口データの意義と利用可能性―仮想都市データの有用性と秘匿性の評価から―

Published in 統計研究彙報, 2024

匿名加工というだけでなく、属性の値の生成を通じて個体情報の秘匿性を確保する擬似的なデータとして、 合成データの作成の研究がなされている。一方、合成人口と呼ばれる、合成データとは異なる経緯から開発 されてきたデータがある。これは、シミュレーション研究を目的として、統計表として公開されている集計 データから数値計算で生成される擬似的なミクロデータである。 合成人口は、合成データや一般用ミクロデータと同様、個票データの情報を秘匿しつつ、もとの集計デー タの性質をできるだけ維持している。さらに合成人口は、実在する個票データに基づいておらず当然に秘匿 性があると考えられてきたため、定量的に評価が示されてこなかった。 そこで、本稿では、仮想都市データをもとに生成した合成人口について、原田他(2022)で示された結果 を整理し直すことで有用性を確認し、その上で ARD(Absolute Relative Difference)を計算することで秘匿性 についての評価を示すことにより、合成人口の意義と利用可能性を主張する。

Recommended citation: 原田 拓弥, 松本 渉, 村田 忠彦: 合成人口データの意義と利用可能性―仮想都市データの有用性と秘匿性の評価から―, 統計研究彙報, No. 81, pp. 53-68 (2024)

Download here

Workplace Assignment to Workers in Synthetic Populations in Japan

Published in IEEE Transactions on Computational Social Systems, 2023

In this article, we assign workplace attributes to each worker in each household in a synthetic population using multiple censuses conducted in Japan. The synthetic population is a set of artificial individual attributes for each resident that is synthesized according to census data. We have synthesized a set of the synthetic populations of Japan. We assign a workplace attribute to each worker to estimate daytime population distribution and develop activity-based models in agent-based or microsimulations. Although statistical information in a residential area or a working place is released by the government and some individual moving data are released by cellphone companies, it is hard to collect the information with home and workplace location of a worker with their family and working information. We employ origin–destination–industry (ODI) statistics to estimate workplaces for workers. Since some attributes in ODI statistics are not available for privacy reasons, we propose a workplace assignment method for all cities, towns, and villages using restricted ODI and OD statistics in Japan. We show how much difference there are between the number of workers using the complete ODI statistics and the number of workers by the proposed workplace assignment method. We show that 88.2% of workers in a city in Japan are assigned to correct cities as workplaces by our proposed method. We also show several maps of daytime population distributions by our proposed method. Synthetic populations with workplace attributes enable real-scale social simulations to design transport or business systems in times of peace or to estimate victims and plan recoveries in times of emergency, such as disasters or pandemics.

Recommended citation: Tadahiko Murata, Daiki Iwase, Takuya Harada: Workplace Assignment to Workers in Synthetic Populations in Japan, IEEE Transactions on Computational Social Systems, Vol. 10, No. 4, pp. 1914-1923 (2023)

Download here

仮想都市の統計情報による合成人口データの評価

Published in 計測自動制御学会論文集, 2022

In this paper, we propose an evaluation method of synthetic population. We calculate the difference between actual statistics and synthetic statistics summarized from synthetic populations in our previous method. If we can use actual individual data, we compare its and synthetic population. However, we can not use it. This paper proposes an evaluation method that calculates the difference between the synthetic population and the virtual population generated from actual statistics.

Recommended citation: 原田 拓弥, 村田 忠彦, 高橋 真吾: 仮想都市の統計情報による合成人口データの評価, 計測自動制御学会論文集, Vol. 58, No. 7, pp. 345-353 (2022)

Download here

市区町村の統計表を考慮した都道府県単位の個票データの合成

Published in 計測自動制御学会論文集, 2022

In this paper, we employ statistics of local governments (city, town, or village) in synthesizing individual households of a prefecture to reduce the error between synthesized statistics and actual statistics. Our previous method used estimated statistics of local governments with fewer than 200,000 people since those smaller local governments do not release their detailed statistics. To reduce the errors with prefectural statistics, we simultaneously employ statistics of smaller local governments in synthesizing population in a prefecture. Our experimental results show that we can make a 1/7 to 1/140 reduction in the error between synthesized statistics and actual statistics compared to the previous method.

Recommended citation: 原田 拓弥, 村田 忠彦: 市区町村の統計表を考慮した都道府県単位の個票データの合成, 計測自動制御学会論文集, Vol. 58, No. 6, pp. 281-289 (2022)

Download here

Social Awareness from Analysis of Available Time for Automated External Defibrillators in a City

Published in 2021 5th IEEE International Conference on Cybernetics (CYBCONF), 2021

In this paper, we conduct an investigation of available time for automated external defibrillators (AED) in a Japanese city. In Japan, there are about 600,000 AEDs available in cities or towns, but only 4.9% of AEDs are used by bystanders when citizens suffer a sudden cardiac arrest. One reason of such low usage rate comes from their available time. Since many AEDs are installed by business owners, they are not available when their business is closed. We investigate the available time of AEDs in a Japanese city, Takatsuki, Osaka Japan. The number of AEDs that are available 24 hours can be increased by installing new AEDs in 24-hour convenience stores and a koban that is a small police station with policemen.

Recommended citation: Tadahiko Murata, Atsuki Fukushima, Takuya Harada, Mie Sasaki: Social Awareness from Analysis of Available Time for Automated External Defibrillators in a City, 2021 5th IEEE International Conference on Cybernetics (CYBCONF), p. (2021)

Download here

Impact of Reabsorption of Spilled Knowledge on Patent Value

Published in 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 2020

Innovation is often achieved by recombining existing knowledge. Thus, effective utilization of external knowledge is key for firms creating innovation. The concept of reabsorption, where an originating firm reabsorbs its spilled knowledge, including the advancements made by recipient firms, has attracted attention. Existing studies have not clarified the impact of reabsorption on patent value and have not confirmed whether an originating firm reabsorbs knowledge in the same or different technological fields. Therefore, this study attempts to examine the impact of reabsorption of knowledge on patent value considering technological fields using patent data. Reabsorption of spilled knowledge was tracked using patent citation data, and patent value was evaluated by the number of citations. By conducting negative binomial regression, we concluded that the reabsorption of spilled knowledge has a positive impact on patent value. Furthermore, the impact is greater when a firm reabsorbs spilled knowledge in different technological fields. Our results suggest that a firm has the ability to effectively create innovation by reabsorbing spilled knowledge.

Recommended citation: T. Miyazaki, R. Takemura, T. Harada, N. Ouchi: Impact of Reabsorption of Spilled Knowledge on Patent Value, 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), p. (2020)

Download here

Distribution System for Japanese Synthetic Population Data with Protection Level

Published in 2020 International Conference on Machine Learning and Cybernetics (ICMLC), 2020

In this paper, we introduce a distribution system of synthesized data of Japanese population using Interdisciplinary Large-scale Information Infra-structures in Japan. Synthetic population is synthesized based on the statistics of the census that are conducted by the government and publicly released. Therefore, the synthesized data have no privacy data. However, it is easy to estimate the compositions of households, working status in a certain area from the synthetic population. Therefore, we currently distribute the synthesized data only for public or academic purposes. For academic purposes, it is important to encourage scholars or researchers to use a large-scale data of households, we define protection levels for the attributes in the synthetic populations. According to the protection levels, we distribute the data with proper attributes to those who try to use them. We encourage researchers to use the synthetic populations to be familiar to large-scale data processing.

Recommended citation: Tadahiko Murata, Susumu Date, Yusuke Goto, Toshihiro Hanawa, Takuya Harada, Manabu Ichikawa, Hao Lee, Masaharu Munetomo, Akiyoshi Sugiki: Distribution System for Japanese Synthetic Population Data with Protection Level, 2020 International Conference on Machine Learning and Cybernetics (ICMLC), p. (2020)

Download here

Published in , 2019

Recommended citation: : , , pp. 1-5 (2019)

Download here

Synthetic and Distribution Method of Japanese Synthesized Population for Real-Scale Social Simulations

Published in The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019

In this paper, we describe how synthesized populations are essential in real-scale social simulations (RSSS), and the current situation of the population synthesis for whole populations in Japan. RSSS is simulations using the real number of populations or households in social simulations. This paper describes how we have completed to synthesize multiple sets of populations based on the statistics of each local government in Japanese national census in 2000, 2005, 2010 and 2015. We have started to distribute those multiple sets of the synthesized populations for researchers of RSSSs in Japan. In distributing the synthesized populations, we should protect personal or private information in the synthesized populations. We show some scheme how to protect them using a cloud service or secure computations.

Recommended citation: Tadahiko MURATA, Takuya HARADA: Synthetic and Distribution Method of Japanese Synthesized Population for Real-Scale Social Simulations, The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, Vol. 33, pp. 1-3 (2019)

Download here

賃金構造基本統計調査に基づく合成世帯集団の労働者への所得の割当て

Published in システム制御情報学会論文誌, 2019

本論文では,合成人口の構成員に所得を割り当てる手法を提案する.合成人口とは,家族の構成員の性別や年齢の属性を,実統計に基づいて合成した個票データである.提案手法では,Simulated-Annealing(SA)法を用いて,合成人口の各構成員に就業属性(就業状態,就業形態,産業分類,企業規模)を追加し,労働者に所得を割り当てる.就業属性や所得の割当てには国勢調査の3 種類の統計表と標本調査を用いるが,各統計表で家族類型の集計方法が異なるため,家族類型ごとの労働者数の調整方法を提案する.調整後の統計表に基づいて,合成人口の初期の就業属性を割り当て,SA法を用いて統計との誤差を最小化する.最後に,各個票の性別・年齢・就業形態・産業分類・企業規模に応じて所得を割り当てる.合成人口から全産業の平均所得を求めたところ,統計上の所得に対して,?0.8%から 10.3%の誤差で所得を割り当てることができた.

Recommended citation: 杉浦 翔, 村田 忠彦, 原田 拓弥: 賃金構造基本統計調査に基づく合成世帯集団の労働者への所得の割当て, システム制御情報学会論文誌, Vol. 32, No. 2, pp. 70-79 (2019)

Download here

Synthetic Method for Population of A Prefecture Using Statistics of Local Governments

Published in 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2018

In this paper, we employ statistics of local governments in synthesizing individual households of a prefecture in order to reduce the error between synthesized statistics and real statistics. In our previous method, we employed estimated statistics of local governments with population less than 200,000 people and separately synthesize populations of local governments in a prefecture. This causes a lot of differences between the synthesized population and the real statistics. In order to reduce the errors with prefectural statistics, we simultaneously employ the statistics of smaller local governments in synthesizing population of a prefecture. Our experimental results show that the proposed method can reduce 42% to 84% errors against the previous method.

Recommended citation: Tadahiko Murata, Takuya Harada: Synthetic Method for Population of A Prefecture Using Statistics of Local Governments, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1175-1180 (2018)

Download here

家族類型と世帯内の役割を考慮したSA法による大規模世帯の復元

Published in 計測自動制御学会論文集, 2018

In this paper, we modify a simulated annealing-based (SA-based) household synthesizing method in order to synthesize a population in the same scale of the target area. Micro-simulations (MS) and agent-based simulations (ABS) are recently employed for social simulations. For enabling MS or ABS, each household composition such as ages, occupations, or other properties of each member of a household should be prepared before simulations. However real household compositions are not available to researchers due to privacy or security reasons. Therefore, we need to synthesize household compositions from available statistics for MS or ABS. However, it should be noted that the synthesized population is just an artificial population that is suitable to the employed statistics. In this paper, we modify an SA-based household synthesizing method based on statistics. We propose a household generation method, new 21 statistics, and an age exchange method for members in households. In our previous research, we employed nine statistics to synthesize populations. In this paper, we synthesize an artificial population from 21 statistics, and we show how errors between the artificial population and real statistics are reduced by the proposed algorithm.

Recommended citation: 原田 拓弥, 村田 忠彦, 枡井 大貴: 家族類型と世帯内の役割を考慮したSA法による大規模世帯の復元, 計測自動制御学会論文集, Vol. 54, No. 9, pp. 705-717 (2018)

Download here

並列計算を用いたSA法による都道府県レベルの大規模世帯の復元

Published in 計測自動制御学会論文集, 2018

In this paper, we employ parallel computing techniques to reconstruct large-scale household compositions for micro-simulations (MS) or agent-based simulations (ABS). MS and ABS are recently employed for social simulations. For enabling MS or ABS, each household composition such as ages, occupations, or other properties of each member of a household should be prepared before simulations. However real household compositions are not available to researchers due to privacy or security reasons. Therefore reconstruction methods from statistics are developed to generate artificial populations for MS or ABS. In this paper, we propose a method to reconstruct a prefecture-level large-scale household compositions based on statistics using parallel computing techniques. In order to generate artificial populations as soon as possible, parallel computing techniques are essential in reconstruction methods. In this paper, we show a challenge in an application of parallel computing to a previously proposed reconstruction method, and show the effectiveness of the proposed method.

Recommended citation: 原田 拓弥, 村田 忠彦: 並列計算を用いたSA法による都道府県レベルの大規模世帯の復元, 計測自動制御学会論文集, Vol. 54, No. 4, pp. 421-429 (2018)

Download here

社会シミュレーションのための異種並列計算環境における再現性の確保

Published in システム制御情報学会論文誌, 2018

本論文では乱数を使用した社会シミュレーションのために異なる計算環境において同一のシミュレーション結果が得られる枠組みを提案する.社会シミュレーションでは,シミュレーション結果の平均的状態を観察するとともに,特異な試行を抽出し分析するために,様々なパラメータの変更が行われる.特異な試行を探すために,大規模化や複数回シミュレーションがされている.そのため,高速化が必要である.しかし,高速化効果が大きい技法の一つである並列化を行うと,並列数により使用される乱数列が異なるため,シミュレーション結果が変化する.特異な現象を再現し,詳細の分析をするためには,シミュレーションの再現性を確保しなければならない.本論文では,社会シミュレーションのモデルのなかでも,ノードをまたいでt-1期以前の状態を共有でき,エージェントがt-1期以前の情報をもとに独立してt期の意思決定が可能なモデルで,CPU 及びGPU の複数の計算機環境を用いた場合においても同一の実験結果を得るための枠組みを提案する.実験の結果,本論文が実験を行った全ての計算機環境において同一の結果を得た.

Recommended citation: 原田 拓弥, 村田 忠彦: 社会シミュレーションのための異種並列計算環境における再現性の確保, システム制御情報学会論文誌, Vol. 31, No. 2, pp. 37-48 (2018)

Download here

Comparing Transition Procedures in Modified Simulated-Annealing-Based Synthetic Reconstruction Method Without Samples

Published in SICE Journal of Control, Measurement, and System Integration, 2017

In this paper, we modify a synthetic reconstruction (SR) method without samples. The synthetic reconstruction method is a method to generate attributes of population such as age, sex and kinship within a family according to available statistics. Although the original SR method employs some individual samples that are collected to make a statistics, it is criticized that generated attributes are only within the samples used in the reconstruction process. In this paper, we employ a simulated annealing-based SR method without samples. We compare two types of generation methods of a candidate solution in a search of simulated annealing: changing age of an agent (age-change) or swapping ages of two agents (age-swap). Results of synthetic reconstruction show that age-change is better when we limit the number of search. On the other hand, age-swap is better when we have enough number of search for reconstructing a population.

Recommended citation: Tadahiko MURATA, Takuya HARADA, Daiki MASUI: Comparing Transition Procedures in Modified Simulated-Annealing-Based Synthetic Reconstruction Method Without Samples, SICE Journal of Control, Measurement, and System Integration, Vol. 10, No. 6, pp. 513-519 (2017)

Download here

Projecting Households of Synthetic Population on Buildings Using Fundamental Geospatial Data

Published in SICE Journal of Control, Measurement, and System Integration, 2017

In this paper, we propose a method to project households of synthetic population using fundamental geospatial data for real-world social simulations. That is, we assign each generated household on a building in a geographical map. When we try to conduct a real-scale social simulation, we need attributes of agents and their locations on a geographical map. We have already proposed a synthetic population method that generates attributes of agents or citizens from the statistics of the real world. To determine the locations of agents, we propose, in this paper, a threefold method to project generated households on buildings in a geographical map using the fundamental geospatial data. We apply the proposed method to project households generated from the statistics of Takatsuki City, Osaka, Japan and project them on buildings in the map. In order to cope with a problem of random assignment of households on buildings, we propose a modified method to consider types and area of buildings. Projection results show that households are assigned more reasonably to isolated houses and apartments.

Recommended citation: Takuya HARADA, Tadahiko MURATA: Projecting Households of Synthetic Population on Buildings Using Fundamental Geospatial Data, SICE Journal of Control, Measurement, and System Integration, Vol. 10, No. 6, pp. 505-512 (2017)

Download here

Nation-wide synthetic reconstruction method

Published in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017

In this paper, we improve a synthetic reconstruction (SR) method without samples to synthesize population of a whole nation by modifying a generation method of the initial population. We modify our Simulated Annealing-based household reconstruction method in three points to reconstruct populations of all prefectures in Japan. We consider more members in each household, the number of males and females in households of “couple, children and a grandparent,” and parents of husband and wife in “couples and parents (or a parent)”. From the reconstruction results, we can see that parents of both husband and wife should be considered. We can also see that considering the statistics on “unmarried male or female” or the statistics on the number of children in “couple, children and grandparents” is useful in determining the number of male and female in “couple, children and a grandparent.”.

Recommended citation: Tadahiko Murata, Takuya Harada: Nation-wide synthetic reconstruction method, 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-6 (2017)

Download here

Income allocation to each worker in synthetic populations using basic survey on wage structure

Published in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017

In this paper, we propose a simulated-annealing based method to allocate an income attribute to each worker in synthetic populations. An income attribute is one of important attributes when microsimulations or agent-based simulations are conducted for making or examining some policy of government, enterprises or firms. We add an income attribute to workers in individual households using Basic Survey on Wage Structure in Japan. In order to add that attribute, we first prepare the synthetic populations of households with members where their sex, age, family type, role and kinship that are already determined by our previously proposed synthetic population generation method (SPGM). Then we add a working status (working or not working) and an industry type if the working status of a household member is working according to three statistics that show the relation between sex, family type, and age in a prefecture or a city using a simulated annealing based SPGM. After determining the working status and their industry, we add average monthly income to each worker in the synthesized population. To see the validity of allocated monthly income, we compare the average income of each industry in the synthesized population with the statistics of the average income of each industry that is not used in the synthesizing procedure.

Recommended citation: Tadahiko Murata, Sho Sugiura, Takuya Harada: Income allocation to each worker in synthetic populations using basic survey on wage structure, 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-6 (2017)

Download here

Projecting Synthetic Households on Buildings using Fundamental Geospatial Data

Published in Social Simulation Conference 2017, 2017

In this paper, we propose a method to project households of synthetic population using fundamental geospatial data for real-world social simulation. That is, we assign each generated household on a building in a geographical map. A synthetic population is a population that is generated from the statistics of the real world. The synthetic reconstruction method has been proposed to generate a synthetic population based on the statistics. We propose a threefold method to project generated households on buildings in a geographical map using the fundamental geospatial data. We apply the proposed method to project households generated from the statistics of Takatsuki City, Osaka and project them on buildings in the map. In order to cope with a problem of random assignment of households on buildings, we modify our proposed method to consider types and area of buildings. Projection results show that households are assigned more reasonable to isolated houses and apartments.

Recommended citation: Takuya HARADA, Tadahiko MURATA: Projecting Synthetic Households on Buildings using Fundamental Geospatial Data, Social Simulation Conference 2017, pp. 1-12 (2017)

Reproducible large-scale social simulations on various computing environment

Published in 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), 2017

In this paper, we propose parallel computing techniques for reproducible large-scale social simulations on various computing environments including CPU (Central Processing Unit) or GPU (Graphic Processing Unit). When we use computing resources for large-scale social simulations, the reproducibility of a simulation should be considered. “Reproducibility” means the same trial of a simulation can be repeated. If the same computing resources are available to repeat the trial, it is easy to reproduce the same simulation results. When not all the same computing resources are available, however, it becomes difficult to obtain the same trial since random number generators may become different from the original computation resources. In this study, we employ multi-thread computing on CPU or GPU. We propose two models to run reproducible social simulations on CPU or GPU. One is to parallelize trials (Trial Parallelization). The other is to parallelize agents of a single simulation (Agent Parallelization). These models can be ensured reproducibility even in different computing resources. Our experimental results show that the same computing processes are obtained on CPU or GPU. When we parallelize large-scale social simulation on CPU or GPU, we can accelerate the simulation as a secondary effect.

Recommended citation: Takuya Harada, Tadahiko Muarata: Reproducible large-scale social simulations on various computing environment, 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), pp. 1-5 (2017)

Download here

Modified SA-based household reconstruction from statistics for agent-based social simulations

Published in 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016

In this paper, we modify a household reconstruction method using simulated annealing (SA) for micro-simulations (MS) or agent-based simulations (ABS). MS and ABS are recently employed for social simulations. For enabling MS or ABS, each household composition such as ages, occupations, or other properties of each member of a household should be prepared before simulations. However real household compositions are not available to researchers due to privacy or security reasons. Therefore, we need to reconstruct household compositions from available statistics for MS or ABS. However, it should be noted that the reconstructed population is just an artificial population that is suitable to the employed statistics. That means, the generated population cannot always be a real population that gives the statistics. We modify the SA-based household reconstruction method in three points: a generation procedure of initial solution, a transition procedure in SA, and an evaluation procedure using more statistics. Through simulation results, we show how errors between the artificial population and real statistics are reduced by the proposed modifications.

Recommended citation: Tadahiko Murata, Takuya Harada, Daiki Masui: Modified SA-based household reconstruction from statistics for agent-based social simulations, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3600-3605 (2016)

Download here

Large Scale Social Simulation with more Than a Hundred Million Agents

Published in 2013 6th International Conference on Emerging Trends in Engineering and Technology, 2013

In this study, we propose a coding method for enabling a huge number of agents in a social simulation such as minority game. The minority game is a game when participants (or agents) win when they select a group with a smaller number of participants. Using our coding method, we successfully implements the minority game with more than a hundred million agents. From our simulation results, we observed a cycle that varies according to the size of memory of each agent that can not be observed with a smaller number of agents. We show some simulation results showing those cycles.

Recommended citation: Tadahiko Murata, Takuya Harada: Large Scale Social Simulation with more Than a Hundred Million Agents, 2013 6th International Conference on Emerging Trends in Engineering and Technology, pp. 159-163 (2013)

Download here