[Lustre] [kernel 5.10] ldiskfs_find_dest_de bad entry in directory

Description

Run io500 banchmak failed.

Client side log

[openeuler@oe2203-test io500]$ sudo OMPI_ALLOW_RUN_AS_ROOT=1 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 ./io500.sh config-all.ini 100 IO500 version io500-sc22_v2 (standard) [RESULT] ior-easy-write 0.105593 GiB/s : time 338.211 seconds ERROR: open64("/mnt/lustre/datafiles/2023.02.14-10.12.17/mdtest-easy/test-dir.0-0/mdtest_tree.0.0/file.mdtest.1.85", 66, 0664) failed. Error: Read-only file system, (aiori-POSIX.c:569) -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode -1.

 

Server side log

[ 9962.007724] LDISKFS-fs error (device dm-0): ldiskfs_find_dest_de:2412: inode #5767170: block 3771253: comm mdt00_000: bad entry in directory: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, name_len=0, size=4096 [ 9962.051171] Aborting journal on device dm-0-8. [ 9962.058456] LDISKFS-fs (dm-0): Remounting filesystem read-only [ 9962.059877] LDISKFS-fs error (device dm-0) in iam_txn_add:547: Journal has aborted [ 9962.064365] LustreError: 11366:0:(osd_io.c:2222:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30 [ 9962.066805] LustreError: 11366:0:(llog_cat.c:592:llog_cat_add_rec()) llog_write_rec -30: lh=00000000c04e4ff3 [ 9962.069137] LustreError: 11366:0:(tgt_lastrcvd.c:1326:tgt_add_reply_data()) lustre-MDT0000: can't update reply_data file: rc = -30 [ 9962.071742] LustreError: 11366:0:(osd_handler.c:2089:osd_trans_stop()) lustre-MDT0000: failed in transaction hook: rc = -30 [ 9962.074184] LustreError: 11366:0:(osd_handler.c:2099:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -30 [ 9962.074274] LustreError: 11348:0:(osd_handler.c:1789:osd_trans_commit_cb()) transaction @0x00000000c73ec34c commit error: 2

 

Similar upstream bug: https://jira.whamcloud.com/browse/LU-12268

Activity

Xinliang Liu 
March 22, 2023 at 2:08 AM

Closed as patch merged.

Xinliang Liu 
March 3, 2023 at 7:47 AM

Xinliang Liu 
February 27, 2023 at 2:10 AM
(edited)

It seems this issue related to below code parts:

Part1 (introduced by  commit: f94c02917f1d ext4: avoid cycles in directory h-tree)

block = dx_get_block(at); for (i = 0; i <= level; i++) { if (blocks[i] == block) { ext4_warning_inode(dir, "dx entry: tree cycle block %u points back to block %u", blocks[level], block); goto fail; } }

Part2 (introduced by ext4-pdirop.patch)

if (indirect == level) { /* the last index level */ struct ext4_dir_lock_data *ld; u64 myblock; /* By default we only lock DE-block, however, we will * also lock the last level DX-block if: * a) there is hash collision * we will set DX-lock flag (a few lines below) * and redo to lock DX-block * see detail in dx_probe_hash_collision() * b) it's a retry from splitting * we need to lock the last level DX-block so nobody * else can split any leaf blocks under the same * DX-block, see detail in ext4_dx_add_entry() */ if (ext4_htree_dx_locked(lck)) { /* DX-block is locked, just lock DE-block * and return */ ext4_htree_spin_unlock(lck); if (!ext4_htree_safe_locked(lck)) ext4_htree_de_lock(lck, frame->at); ... if (myblock == EXT4_HTREE_NODE_CHANGED) { /* someone split this DE-block before * I locked it, I need to retry and lock * valid DE-block */ ext4_htree_de_unlock(lck); continue; } return frame; }

 

After putting part2 after part1, this issue gone.

Xinliang Liu 
February 22, 2023 at 2:50 AM
(edited)

Bisected the related commit/patch:

f94c02917f1d ext4: avoid cycles in directory h-tree ( Which included in openEuler 22.03 LTS kernel kernel-5.10.0-60.58.0.86.oe2203) ldiskfs/kernel_patches/patches/oe2203/ext4-pdirop.patch (Lustre ldiskfs patch on ext4)

Workaround:

revert commit “f94c02917f1d ext4: avoid cycles in directory h-tree“ and update ext4-pdirop.patch.

see io500 test suite test result

[openeuler@oe2203-test io500]$ sudo ./io500 config-minimal.ini IO500 version io500-sc22_v2 (standard) [RESULT] ior-easy-write 0.103132 GiB/s : time 316.294 seconds [RESULT] mdtest-easy-write 0.067036 kIOPS : time 301.645 seconds [ ] timestamp 0.000000 kIOPS : time 0.000 seconds [RESULT] ior-hard-write 0.101985 GiB/s : time 312.619 seconds [RESULT] mdtest-hard-write 0.054293 kIOPS : time 301.826 seconds [RESULT] find 3.992785 kIOPS : time 9.124 seconds [RESULT] ior-easy-read 0.023636 GiB/s : time 1380.092 seconds [RESULT] mdtest-easy-stat 0.107839 kIOPS : time 187.558 seconds [RESULT] ior-hard-read 0.022159 GiB/s : time 1438.550 seconds [RESULT] mdtest-hard-stat 0.203911 kIOPS : time 81.015 seconds [RESULT] mdtest-easy-delete 0.106105 kIOPS : time 190.760 seconds [RESULT] mdtest-hard-read 0.065468 kIOPS : time 250.149 seconds [RESULT] mdtest-hard-delete 0.103164 kIOPS : time 159.408 seconds [SCORE ] Bandwidth 0.048447 GiB/s : IOPS 0.147904 kiops : TOTAL 0.084649 The result files are stored in the directory: ./results/2023.02.22-01.44.09 [openeuler@oe2203-test io500]$ uname -r 5.10.0-60.79.0.103debug.oe2203.aarch64

 

The commit f94c02917f1d should be ok, we need to tune the ext4-pdirop.patch maybe.

Done

Details

Assignee

Reporter

Original estimate

Time tracking

No time logged2w remaining

Priority

Checklist

Sentry

Created February 15, 2023 at 2:41 AM
Updated April 7, 2023 at 3:07 PM
Resolved March 22, 2023 at 2:08 AM