NumPyファイル: 読み込み

このページは，Python の NumPy をファイルの読み込む方法を説明します．

NumPy のファイル読み込み

数値データファイルの読み込みは，NumPy の genfromtxt がとても便利です．読み込んだデータが NumPy の array になるので，その後の処理も容易です．

genfromtxt を使った読み込み

読み込み例

引数

genfromtxt は，ファイルからデータを読み取ることができます．戻り値は，ndarray です．

numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#',
      delimiter=None, skip_header=0, skip_footer=0,
      converters=None, missing_values=None, filling_values=None,
      usecols=None, names=None, excludelist=None, deletechars=None,
      replace_space='_', autostrip=False, case_sensitive=True,
      defaultfmt='f%i', unpack=None, usemask=False,
      loose=True, invalid_raise=True, max_rows=None)

numpy.genfromtxt．ファイル名以外はオプションです．
引数名	型	デフォルト	引数の意味
fname	file, str, pathlib.Path, list of str, generator	無し	データが書かれたファイル名．拡張子が gz や bz2 は最初に解凍します．
dtype	dtype	float	Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.
comments	str	'#'	The character used to indicate the start of a comment. All the characters occurring on a line after a comment are discarded
delimiter	str, int, sequence	None	The string used to separate values. By default, any consecutive whitespaces act as delimiter. An integer or sequence of integers can also be provided as width(s) of each field.
skip_header	int	0	The number of lines to skip at the beginning of the file.
skip_footer	int	0	The number of lines to skip at the end of the file.
converters	variable	None	列のデータを値に変換する関数．コンバーターは，欠落しているデータのデフォルト値を提供するために使用することもできます．例：converter = {3：lambda：float（s or 0）}．
missing_values	variable	None	ミッシングデータに対応する文字列の集合．
filling_values	variable	None	データが無い (ミッシング) 場合，デフォルトの値の集合．
usecols	sequence	None	Which columns to read, with 0 being the first. For example, usecols = (1, 4, 5) will extract the 2nd, 5th and 6th columns.
names	{None, True, str, sequence}	None	If names is True, the field names are read from the first valid line after the first skip_header lines. If names is a sequence or a single-string of comma-separated names, the names will be used to define the field names in a structured dtype. If names is None, the names of the dtype fields will be used, if any.
excludelist	sequence	None	A list of names to exclude. This list is appended to the default list [‘return’,’file’,’print’]. Excluded names are appended an underscore: for example, file would become file_.
deletechars	文字	None	A string combining invalid characters that must be deleted from the names.
replace_space	char	'_'	Character(s) used in replacement of white spaces in the variables names. By default, use a ‘_’.
autostrip	ブール値	False	Whether to automatically strip white spaces from the variables.
case_sensitive	{True, False, ‘upper’, ‘lower’}	True	If True, field names are case sensitive. If False or ‘upper’, field names are converted to upper case. If ‘lower’, field names are converted to lower case.
defaultfmt	文字	'f%i'	A format used to define default field names, such as “f%i” or “f_%02i”.
unpack	bool	None	If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...)
usemask	bool	False	If True, return a masked array. If False, return a regular array.
loose	bool	True	If True, do not raise errors for invalid values.
invalid_raise	bool	True	If True, an exception is raised if an inconsistency is detected in the number of columns. If False, a warning is emitted and the offending lines are skipped.
max_rows	int	None	The maximum number of rows to read. Must not be used with skip_footer at the same time. If given, the value must be at least 1. Default is to read the entire file.

応用

ひとつのファイルに複数のデータセットがある場合，それぞれを genfromtxt で読み込む具体的なプログラムを示します．

import argparse
import numpy as np
from io import StringIO
import re

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='CST-Studio S21 の最小値を示す')
    parser.add_argument('-i', '--imput', help='インプットファイル (CST 出力)')
    args = parser.parse_args()

    # ファイルを読み込む
    with open(args.imput) as cst:
        results = cst.read()

    # データセットの区切りは「#-----」．正規表現です．
    data_sets = re.split(r'#-{4,}', results)

    datas = []
    for data_set in data_sets:
        if data_set.strip():    # 空のセクションは無視する
            # StringIO を使用し文字列データをファイルのように扱う
            data = np.genfromtxt(StringIO(data_set))
            datas.append(data)

ページ作成情報

参考資料

numpy.genfromtxt — NumPy v1.12 Manual に，詳しい説明が有ります．

更新履歴

2017年6月11日

ページの新規作成